如需发表评论,请前往livebook。
To comment go to livebook.
曼宁
避难岛
Manning
Shelter Island
有关本书及其他曼宁出版社书籍的更多信息,请访问manning.com。
For more information on this and other Manning titles go to manning.com.
如需在线了解本书及其他曼宁出版社书籍的信息并进行订购,请访问www.manning.com。出版社为批量订购本书提供折扣。更多信息,请联系我们。
For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact
特价销售部
Special Sales Department
曼宁出版公司
Manning Publications Co.
鲍德温路20号
20 Baldwin Road
邮政信箱 761
PO Box 761
纽约州谢尔特岛,邮编11964
Shelter Island, NY 11964
电子邮件:orders@manning.com
Email: orders@manning.com
©2025 Manning Publications Co. 版权所有。
©2025 by Manning Publications Co. All rights reserved.
未经出版商事先书面许可,不得以任何形式或通过电子、机械、影印或其他方式复制、存储于检索系统或传播本出版物的任何部分。
No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher.
许多制造商和销售商用于区分其产品的名称都已注册为商标。本书中若出现此类名称,且曼宁出版社知晓其已注册为商标,则以首字母大写或全部大写字母印刷。
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps.
曼宁出版社深知保存文字的重要性,因此坚持使用无酸纸印刷书籍,并为此竭尽全力。同时,我们也意识到保护地球资源的责任,曼宁出版社的书籍均采用至少含有15%再生纸成分且不使用元素氯漂白的纸张印刷。
Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine.
作者和出版商已尽一切努力确保本书信息在付印时准确无误。但对于因错误或遗漏造成的任何损失、损害或不便,无论该等错误或遗漏是由于疏忽、意外或其他任何原因造成,亦无论该等错误或遗漏是由于使用本书信息造成,作者和出版商均不承担任何责任,并特此声明免责。
The authors and publisher have made every effort to ensure that the information in this book was correct at press time. The authors and publisher do not assume and hereby disclaim any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from negligence, accident, or any other cause, or from any usage of the information herein.
曼宁出版公司,
鲍德温路20号
,邮政信箱761
,纽约州谢尔特岛,邮编11964
Manning Publications Co.
20 Baldwin Road
PO Box 761
Shelter Island, NY 11964
开发编辑:贝基·惠特尼;
技术编辑:罗斯·特纳
;审校编辑:基肖尔·里特;
制作编辑:凯瑞·黑尔斯;
文案编辑:朱莉·麦克纳米;
校对员:凯蒂·坦南特;
技术校对员:罗斯·特纳;
排版员:丹尼斯·达利尼克;
封面设计师:玛丽亚·图多尔
Development editor: Becky Whitney
Technical editor: Ross Turner
Review editor: Kishor Rit
Production editor: Keri Hales
Copy editor: Julie McNamee
Proofreader: Katie Tennant
Technical proofreader: Ross Turner
Typesetter: Dennis Dalinnik
Cover designer: Marija Tudor
ISBN:9781633436343
ISBN: 9781633436343
美国印刷
Printed in the United States of America
我将这本书献给所有与我一同踏上这段旅程的读者。
书籍是作者与读者建立深层次个人
联系的有力途径,我们可以逐章逐页地交流。在共同学习、
探索和成长的过程中,我找到了真正的意义。愿这本书能够启发你
、挑战你,并帮助你看到人工智能代理所蕴含的巨大潜力——
不仅面向未来,也面向当下。
I dedicate this book to all the readers who embark on this journey with me.
Books are a powerful way for an author to connect with readers on a deeply personal
level, chapter by chapter, page by page. In that shared experience of learning,
exploring, and growing together, I find true meaning. May this book inspire you
and challenge you, and help you see the incredible potential that AI agents hold—
not just for the future but also for today.
我踏入智能系统领域的旅程始于上世纪80年代初。和当时许多人一样,我相信人工智能(AI)指日可待。似乎只要再有一次创新和技术飞跃,我们就能拥有梦寐以求的智能。然而,那次飞跃却始终没有到来。
My journey into the world of intelligent systems began back in the early 1980s. Like many people then, I believed artificial intelligence (AI) was just around the corner. It always seemed like one more innovation and technological leap would lead us to the intelligence we imagined. But that leap never came.
或许是斯坦利·库布里克电影《2001太空漫游》中HAL的形象,让我对真正智能的计算机伙伴产生了浓厚的兴趣。经过多年的努力、尝试和无数次失败,我逐渐意识到,创造人工智能远比我们人类想象的要复杂得多。20世纪90年代初,我转变了方向,将我的技能应用于其他行业中更实际的目标。
Perhaps the promise of HAL, from Stanley Kubrick’s 2001: A Space Odyssey, captivated me with the idea of a truly intelligent computer companion. After years of effort, trial, and countless errors, I began to understand that creating AI was far more complex than we humans had imagined. In the early 1990s, I shifted my focus, applying my skills to more tangible goals in other industries.
直到上世纪90年代末,经历了一系列充满挑战和变革的事件后,我才意识到自己对构建智能系统的热情。我知道这些系统或许永远无法达到HAL那样的超级智能水平,但我并不在意。我从机器学习和数据科学领域找到了成就感,创造出能够学习和适应的模型。二十多年来,我一直在这个领域如鱼得水,致力于解决那些需要创造力、精准性和无限可能性的问题。
Not until the late 1990s, after experiencing a series of challenging and transformative events, did I realize my passion for building intelligent systems. I knew these systems might never reach the superintelligence of HAL, but I was okay with that. I found fulfillment in working with machine learning and data science, creating models that could learn and adapt. For more than 20 years, I thrived in this space, tackling problems that required creativity, precision, and a sense of possibility.
在那段时间里,我从事过各种各样的工作,从用于预测未知输入的遗传算法到为石油天然气行业的水平钻井开发生成式学习模型。这些经历促使我开始写作,并通过书籍分享我的知识,内容涵盖广泛,例如:逆向工程 Pokémon Go、构建增强现实和虚拟现实体验、为游戏设计音频,以及应用强化学习创建智能体。我曾多年埋头于代码,使用 Unity ML-Agents 和深度强化学习开发智能体。
During that time, I worked on everything from genetic algorithms for predicting unknown inputs to developing generative learning models for horizontal drilling in the oil-and-gas sector. These experiences led me to write, where I shared my knowledge by way of books on various topics—reverse-engineering Pokémon Go, building augmented and virtual reality experiences, designing audio for games, and applying reinforcement learning to create intelligent agents. I spent years knuckles-deep in code, developing agents in Unity ML-Agents and deep reinforcement learning.
即使在那时,我也从未想过有一天我只需向人工智能模型描述我的需求,它就能帮我实现。我也从未想过,在我有生之年,我能像今天这样自然地与人工智能协作。我当然也从未想过,这段旅程会如此迅速,同时又如此缓慢。
Even then, I never imagined that one day I could simply describe what I wanted to an AI model, and it would make it happen. I never imagined that, in my lifetime, I would be able to collaborate with an AI as naturally as I do today. And I certainly never imagined how fast—and simultaneously how slow—this journey would feel.
2022年11月,ChatGPT的发布彻底改变了一切。它改变了世界对人工智能的认知,也改变了我们构建智能系统的方式。对我而言,它也改变了我对这些系统能力的看法。突然间,能够自主执行复杂任务的智能体不再是遥不可及的梦想,而是触手可及的现实。在我早期的著作中,我曾描述过能够执行特定任务的智能体系统,但现在,这些曾经的理论构想已经触手可及。
In November 2022, the release of ChatGPT changed everything. It changed the world’s perception of AI, and it changed the way we build intelligent systems. For me, it also altered my perspective on the capabilities of these systems. Suddenly, the idea of agents that could autonomously perform complex tasks wasn’t just a far-off dream but instead a tangible, achievable reality. In some of my earlier books, I had described agentic systems that could undertake specific tasks, but now, those once-theoretical ideas were within reach.
这本书是我数十年来构建智能系统经验的集大成之作,也是我曾经对人工智能未来愿景的实现。人工智能体已经到来,它们蓄势待发,即将改变我们与技术的互动方式、我们的工作方式,以及最终我们的生活方式。
This book is the culmination of my decades of experience in building intelligent systems, but it’s also a realization of the dreams I once had about what AI could become. AI agents are here, poised to transform how we interact with technology, how we work, and, ultimately, how we live.
然而,即便在今天,我仍然看到许多组织在采用智能体系统时犹豫不决。我认为这种犹豫并非源于对人工智能的恐惧,而是源于对构建此类系统缺乏了解和专业知识。我希望本书能够帮助弥合这一差距。我想向大家介绍人工智能智能体,它们可以成为人人都能使用的工具——我们不应该惧怕它们,而应该尊重它们、负责任地管理它们,并学习如何与它们和谐共处。
Yet, even now, I see hesitation from organizations when it comes to adopting agentic systems. I believe this hesitation stems not from fear of AI but rather from a lack of understanding and expertise in building these systems. I hope that this book helps to bridge that gap. I want to introduce AI agents as tools that can be accessible to everyone—tools we shouldn’t fear but instead respect, manage responsibly, and learn to work with in harmony.
我衷心感谢机器学习和深度学习社区的不懈努力和卓越贡献。就在几年前,许多人还在质疑人工智能领域是否会再次陷入“寒冬”——一段停滞不前、充满疑虑的时期。但正是由于无数人的坚持、才华和热情,人工智能领域不仅挺过了难关,而且蓬勃发展。我们正站在人工智能驱动未来的门槛上,我对这个才华横溢的社区所做出的贡献感激不尽。
I want to extend my deepest gratitude to the machine learning and deep learning communities for their tireless dedication and incredible work. Just a few short years ago, many questioned whether the field was headed for another AI winter—a period of stagnation and doubt. But thanks to the persistence, brilliance, and passion of countless individuals, the field not only persevered but also flourished. We’re standing on the threshold of an AI-driven future, and I am endlessly grateful for the contributions of this talented community.
即使借助人工智能,写书也绝非易事。它需要投入大量精力、精诚合作,以及各方的鼎力支持。我衷心感谢所有编辑和审稿人,是他们的辛勤付出成就了这本书。我要向每一位抽出时间审阅并提供反馈意见的人致以最诚挚的谢意。尤其要感谢我的内容编辑贝基·惠特尼(Becky Whitney)和我的技术编辑兼OpenSC首席制作和技术官罗斯·特纳(Ross Turner),感谢他们的辛勤付出;同时,我也要感谢Manning出版社的全体制作团队,感谢他们在整个过程中给予我的深刻见解和坚定不移的支持。
Writing a book, even with the help of AI, is no small feat. It takes dedication, collaboration, and a tremendous amount of support. I am incredibly thankful to the team of editors and reviewers who made this book possible. I want to express my heartfelt thanks to everyone who took the time to review and provide feedback. In particular, I want to thank Becky Whitney, my content editor, and Ross Turner, my technical editor and chief production and technology officer at OpenSC, for their dedication, as well as the whole production team at Manning for their insight and unwavering support throughout this journey.
致我的伴侣朗达——你的爱、耐心和鼓励对我意义非凡。你一直是我最坚实的后盾,不仅在这本书的创作过程中,也在我之前的所有作品中。没有你,我真的无法完成这一切。感谢你成为我的磐石、我的伴侣和我的灵感源泉。
To my partner, Rhonda—your love, patience, and encouragement mean the world to me. You’ve been the cornerstone of my support system, not just for this book but for all the books that have come before. I truly couldn’t have done any of this without you. Thank you for being my rock, my partner, and my inspiration.
本书的许多早期构思都源于我在Symend的工作经历。正是在Symend期间,我开始构思并设计智能体系统,这些概念和设计奠定了本书的基础。我衷心感谢Symend的同事们,特别是Peh Teh、Andrew Wright、Ziko Rajabali、Chris Garrett、Kouros、Fatemeh Torabi Asr、Sukh Singh和Hanif Joshaghani,感谢他们的合作与贡献。你们的真知灼见和辛勤付出使这些想法得以实现,我非常荣幸能与如此杰出的团队共事。
Many of the early ideas for this book grew out of my work at Symend. It was during my time there that I first began developing the concepts and designs for agentic systems that laid the foundation for this book. I am deeply grateful to my colleagues at Symend for their collaboration and contributions, including Peh Teh, Andrew Wright, Ziko Rajabali, Chris Garrett, Kouros, Fatemeh Torabi Asr, Sukh Singh, and Hanif Joshaghani. Your insights and hard work helped bring these ideas to life, and I am honored to have worked alongside such an incredible group of people.
最后,我要感谢所有审稿人:Anandaganesh Balakrishnan、Aryan Jadon、Chau Giang、Dan Sheikh、David Curran、Dibyendu Roy Chowdhury、Divya Bhargavi、Felipe Provezano Coutinho、Gary Pass、John Williams、Jose San Leandro、Laurence Giglio、Manish Jain、Maxim Volgin、Michael Wang、Mike Metzger、Piti Champeethong、Prashant Dwivedi、Radhika Kanubaddhi、Rajat Kant Goel、Ramaa Vissa、Richard Vaughan、Satej Kumar Sahu、Sergio Gtz、Siva Dhandapani、Annamaneni Sriharsha、Sri Ram Macharla、Sumit Bhattacharyya、Tony Holdroyd、Vidal Graupera、Vidhya Vinay 和 Vinoth Nageshwaran。您的建议使这本书变得更好。
Finally, I would like to thank all the reviewers: Anandaganesh Balakrishnan, Aryan Jadon, Chau Giang, Dan Sheikh, David Curran, Dibyendu Roy Chowdhury, Divya Bhargavi, Felipe Provezano Coutinho, Gary Pass, John Williams, Jose San Leandro, Laurence Giglio, Manish Jain, Maxim Volgin, Michael Wang, Mike Metzger, Piti Champeethong, Prashant Dwivedi, Radhika Kanubaddhi, Rajat Kant Goel, Ramaa Vissa, Richard Vaughan, Satej Kumar Sahu, Sergio Gtz, Siva Dhandapani, Annamaneni Sriharsha, Sri Ram Macharla, Sumit Bhattacharyya, Tony Holdroyd, Vidal Graupera, Vidhya Vinay, and Vinoth Nageshwaran. Your suggestions helped make this a better book.
《人工智能代理实战》一书探讨了如何构建和使用智能代理系统——不仅包括创建自主实体,还包括开发能够有效应对和解决现实世界问题的代理。本书首先介绍如何使用大型语言模型(LLM)构建助手、多代理系统和智能行为代理。在此基础上,本书深入探讨了代理系统的关键组成部分:用于知识和记忆增强的检索系统、动作和工具的使用、推理、规划、评估和反馈。本书通过实际案例展示了这些组成部分如何赋予代理执行各种复杂任务的能力。
AI Agents in Action is about building and working with intelligent agent systems—not just creating autonomous entities but also developing agents that can effectively tackle and solve real-world problems. The book starts with the basics of working with large language models (LLMs) to build assistants, multi-agent systems, and agentic behavioral agents. From there, it explores the key components of agentic systems: retrieval systems for knowledge and memory augmentation, action and tool usage, reasoning, planning, evaluation, and feedback. The book demonstrates how these components empower agents to perform a wide range of complex tasks through practical examples.
这段旅程不仅仅关乎技术,更关乎我们重新构想解决问题的方式。我希望这本书能启发你将智能体视为创新伙伴,它们能够以过去认为不可能的方式将想法转化为行动。我们将共同探索人工智能如何增强人类潜能,使我们能够取得远超单打独斗所能达到的成就。
This journey isn’t just about technology; it’s about reimagining how we approach problem solving. I hope this book inspires you to see intelligent agents as partners in innovation, capable of transforming ideas into actions in ways that were once thought impossible. Together, we’ll explore how AI can augment human potential, enabling us to achieve far more than we could alone.
本书面向所有对智能体及其系统开发感兴趣的读者——无论你是正在构建第一个实用助手,还是深入研究复杂的多智能体系统。无需任何智能体、智能体系统、提示工程或LLM(逻辑逻辑模型)方面的经验。你只需要具备Python基础知识和对GitHub代码库的熟悉即可。我的目标是让这些概念通俗易懂、引人入胜,使任何想要探索人工智能智能体世界的人都能充满信心地迈出第一步。
This book is for anyone curious about intelligent agents and how to develop agentic systems—whether you’re building your first helpful assistant or diving deeper into complex multi-agent systems. No prior experience with agents, agentic systems, prompt engineering, or working with LLMs is required. All you need is a basic understanding of Python and familiarity with GitHub repositories. My goal is to make these concepts accessible and engaging, empowering anyone who wants to explore the world of AI agents to do so with confidence.
无论您是开发者、研究人员、业余爱好者,还是仅仅对人工智能的无限可能充满好奇,本书都将为您带来启发。我希望您能从本书中获得灵感、实用指导,并对智能体的巨大潜力有全新的认识。本书将引导您理解、创建并释放人工智能体在实际应用中的强大力量。
Whether you’re a developer, researcher, or hobbyist or are simply intrigued by the possibilities of AI, this book is for you. I hope that in these pages you’ll find inspiration, practical guidance, and a new appreciation for the remarkable potential of intelligent agents. Let this book guide understanding, creating, and unleashing the power of AI agents in action.
本书共11章。第一章“智能体及其世界简介”首先奠定基础,给出了大型语言模型、聊天系统、助手和自主智能体的基本定义。随着本书的展开,讨论逐渐转向构成智能体的关键组件,以及这些组件如何协同工作以创建真正有效的系统。以下是第二章至第十一章的简要概述:
This book has 11 chapters. Chapter 1, “Introduction to agents and their world,” begins by laying a foundation with fundamental definitions of large language models, chat systems, assistants, and autonomous agents. As the book progresses, the discussion shifts to the key components that make up an agent and how these components work together to create truly effective systems. Here is a quick summary of chapters 2 through 11:
本书的代码分散在多个开源项目中,其中许多项目由我或其他组织托管在 GitHub 代码库中。本书力求使内容尽可能易于理解,采用低代码方法,帮助读者专注于核心概念。许多章节演示了如何通过简单的提示生成有意义的代码,展现了人工智能辅助开发的强大功能。
The code for this book is spread across several open source projects, many of which are hosted by me or by other organizations in GitHub repositories. Throughout this book, I strive to make the content as accessible as possible, taking a low-code approach to help you focus on core concepts. Many chapters demonstrate how simple prompts can generate meaningful code, showcasing the power of AI-assisted development.
此外,您还会发现各种助手配置文件和多智能体系统,它们演示了如何使用生成的代码解决实际问题。这些示例旨在启发、指导并赋能您探索人工智能智能体的无限可能。我衷心感谢众多贡献者和社区成员为这些项目所做的合作,并鼓励您探索代码库、尝试运行代码,并根据自身需求进行调整。本书充分展现了合作的力量以及我们携手所能取得的卓越成就。
Additionally, you’ll find a variety of assistant profiles and multi-agent systems that demonstrate how to solve real-world problems using generated code. These examples are meant to inspire, guide, and empower you to explore what is possible with AI agents. I am deeply grateful to the many contributors and the community members who have collaborated on these projects, and I encourage you to explore the repositories, experiment with the code, and adapt it to your own needs. This book is a testament to the power of collaboration and the incredible things we can achieve together.
本书包含大量源代码示例,既有编号列表,也有与正文并列的源代码。无论哪种形式,源代码都采用特殊格式,fixed-width font like this与普通文本区分开来。有时,我们会对部分代码进行排版,以突出显示本章中与前文步骤相比发生更改的代码,例如在现有代码行中添加新功能时。在许多情况下,我们对原始源代码进行了重新格式化;我们添加了换行符并调整了缩进,以适应本书的可用页面空间。在某些情况下,即使这样也不够,因此我们在代码列表中添加了行继续标记(↪)。此外,当代码在正文中已有描述时,我们通常会从代码列表中移除源代码中的注释。许多代码列表都附有代码注释,以突出显示重要的概念。in bold
This book contains many examples of source code both in numbered listings and in line with normal text. In both cases, source code is formatted in a fixed-width font like this to separate it from ordinary text. Sometimes, some of the code is typeset in bold to highlight code that has changed from previous steps in the chapter, such as when a feature is added to an existing line of code. In many cases, the original source code has been reformatted; we’ve added line breaks and reworked indentation to accommodate the available page space in the book. In some cases, even this wasn’t enough, and listings include line-continuation markers (↪). Additionally, comments in the source code have often been removed from the listings when the code is described in the text. Code annotations accompany many of the listings, highlighting important concepts.
您可以从本书的在线版本(https://livebook.manning.com/book/ai-agents-in-action )获取可执行代码片段。本书示例的完整代码可从 Manning 网站(www.manning.com/books/ai-agents-in-action )下载。此外,本书开发的代码已发布在三个 GitHub 代码库中,所有代码库均可公开访问:
You can get executable snippets of code from the liveBook (online) version of this book at https://livebook.manning.com/book/ai-agents-in-action. The complete code for the examples in the book is available for download from the Manning website at www.manning.com/books/ai-agents-in-action. In addition, the code developed for this book has been placed in three GitHub repositories that are all publicly accessible:
购买《AI Agents in Action》即可免费访问 Manning 的在线阅读平台 liveBook。借助 liveBook 独有的讨论功能,您可以对整本书、特定章节或段落发表评论。您可以轻松地做笔记、提出和回答技术问题,并获得作者和其他用户的帮助。要访问论坛,请访问https://livebook.manning.com/book/ai-agents-in-action/discussion 。您还可以访问https://livebook.manning.com/discussion了解更多关于 Manning 论坛和行为准则的信息。
Purchase of AI Agents in Action includes free access to liveBook, Manning’s online reading platform. Using liveBook’s exclusive discussion features, you can attach comments to the book globally or to specific sections or paragraphs. It’s a snap to make notes for yourself, ask and answer technical questions, and receive help from the author and other users. To access the forum, go to https://livebook.manning.com/book/ai-agents-in-action/discussion. You can also learn more about Manning’s forums and the rules of conduct at https://livebook.manning.com/discussion.
曼宁出版社致力于为读者提供一个平台,让读者之间以及读者与作者之间能够进行有意义的对话。这并不意味着作者必须参与论坛讨论,作者的贡献完全出于自愿(且无偿)。我们建议您不妨向他提出一些有挑战性的问题,以免他失去兴趣!只要本书仍在印刷发行,您就可以通过出版社网站访问论坛和以往讨论的存档。
Manning’s commitment to our readers is to provide a venue where a meaningful dialogue between individual readers and between readers and the author can take place. It isn’t a commitment to any specific amount of participation on the part of the author, whose contribution to the forum remains voluntary (and unpaid). We suggest you try asking the him challenging questions lest his interest stray! The forum and the archives of previous discussions will be accessible from the publisher’s website as long as the book is in print.
《AI Agents in Action》封面上的人物是“Clémentinien”,摘自 Balthasar Hacquet于 1815 年出版的《Illyrie et la Dalmatie 插图》。
The figure on the cover of AI Agents in Action is “Clémentinien,” taken from Balthasar Hacquet’s Illustrations de L’Illyrie et la Dalmatie, published in 1815.
在那个年代,人们仅凭衣着就能轻易辨别居住地、职业和社会地位。曼宁以几个世纪前丰富多彩的地域文化为灵感,设计书籍封面,以此赞颂计算机行业的创造力和进取精神,并借助诸如此类的藏品图片,让这些文化重现生机。
In those days, it was easy to identify where people lived and what their trade or station in life was just by their dress. Manning celebrates the inventiveness and initiative of the computer business with book covers based on the rich diversity of regional culture centuries ago, brought back to life by pictures from collections such as this one.
在机器学习和人工智能(AI)领域,“代理”并非一个新概念。例如,在强化学习中,“代理”一词指的是具有主动决策和学习能力的智能体。在其他领域,“代理”一词更多地指能够代表用户执行某些操作的自动化应用程序或软件。
The agent isn’t a new concept in machine learning and artificial intelligence (AI). In reinforcement learning, for instance, the word agent denotes an active decision-making and learning intelligence. In other areas, the word agent aligns more with an automated application or software that does something on your behalf.
你可以查阅任何在线词典来查找“代理人”的定义。韦氏词典是这样定义的(www.merriam-webster.com/dictionary/agent):
You can consult any online dictionary to find the definition of an agent. The Merriam-Webster Dictionary defines it this way (www.merriam-webster.com/dictionary/agent):
本书中,我们构建强大智能体的旅程中,“智能体” (agent)一词采用的是词典定义。这意味着“助手” (assert)一词也将与“智能体”同义。像 OpenAI 的 GPT 助手这样的工具也属于人工智能智能体的范畴。OpenAI 之所以避免使用“智能体”一词,是因为在机器学习的历史中,智能体通常是指能够自主决策和独立运作的个体。
The word agent in our journey to build powerful agents in this book uses this dictionary definition. That also means the term assistant will be synonymous with agent. Tools like OpenAI’s GPT Assistants will also fall under the AI agent blanket. OpenAI avoids the word agent because of the history of machine learning, where an agent is self-deciding and autonomous.
图 1.1 展示了用户可以直接或通过代理/助手、代理/助手或自主代理与大型语言模型 (LLM) 交互的四种场景。以下列表将更详细地介绍这四种用例:
Figure 1.1 shows four cases where a user may interact with a large language model (LLM) directly or through an agent/assistant proxy, an agent/assistant, or an autonomous agent. These four use cases are highlighted in more detail in this list:
图 1.1 展示了使用单个智能体在 LLM 上执行单个操作流程的用例。对于更复杂的问题,我们通常会将智能体拆分为不同的配置文件或角色。每个智能体配置文件都被赋予特定的任务,并使用专门的工具和知识来执行该任务。
Figure 1.1 demonstrates the use cases for a single flow of actions on an LLM using a single agent. For more complex problems, we often break agents into profiles or personas. Each agent profile is given a specific task and executes that task with specialized tools and knowledge.
多智能体系统是由多个智能体以各种配置协同工作以解决问题的智能体配置文件组成。图 1.2 展示了一个使用三个智能体的多智能体系统示例:一个控制器或代理,以及两个由该代理控制的配置文件智能体(作为工作智能体)。左侧的编码智能体配置文件负责编写用户请求的代码;右侧的测试智能体配置文件负责编写单元测试。这些智能体协同工作和通信,直到它们对代码满意为止,然后将代码交付给用户。
Multi-agent systems are agent profiles that work together in various configurations to solve a problem. Figure 1.2 demonstrates an example of a multi-agent system using three agents: a controller or proxy and two profile agents as workers controlled by the proxy. The coder profile on the left writes the code the user requests; on the right is a tester profile designed to write unit tests. These agents work and communicate together until they are happy with the code and then pass it on to the user.
图 1.2 展示了一种可能的无限多代理配置。(在第 4 章中,我们将探讨微软的开源平台 AutoGen,该平台支持多种配置以部署多代理系统。)
Figure 1.2 shows one of the possibly infinite agent configurations. (In chapter 4, we’ll explore Microsoft’s open source platform, AutoGen, which supports multiple configurations for employing multi-agent systems.)
多智能体系统既可以自主运行,也可以完全依靠人类反馈进行指导。使用多个智能体的优势与单个智能体类似,但通常会更加显著。单个智能体通常专注于单一任务,而多智能体系统可以并行处理多个任务。此外,多个智能体还可以提供反馈和评估,从而减少任务完成过程中的错误。
Multi-agent systems can work autonomously but may also function guided entirely by human feedback. The benefits of using multiple agents are like those of a single agent but often magnified. Where a single agent typically specializes in a single task, multi-agent systems can tackle multiple tasks in parallel. Multiple agents can also provide feedback and evaluation, reducing errors when completing assignments.
正如我们所见,人工智能代理或代理系统可以通过多种方式构建。然而,代理本身也可以由多个组件构成。下一节我们将探讨代理的各种特性,包括其概况、可执行的操作以及记忆和规划能力。
As we can see, an AI agent or agent system can be assembled in multiple ways. However, an agent itself can also be assembled using multiple components. In the next section, we’ll cover topics ranging from an agent’s profile to the actions it may perform, as well as memory and planning.
智能体可以是包含多个组件系统的复杂单元。这些组件是智能体用来帮助其完成目标或指定任务,甚至创建新任务的工具。组件可以是简单系统,也可以是复杂系统,通常分为五类。
Agents can be complex units composed of multiple component systems. These components are the tools the agent employs to help it complete its goal or assigned tasks and even create new ones. Components may be simple or complex systems, typically split into five categories.
图 1.3 描述了单智能体系统可能包含的主要组件类别。每个组件都有子类型,这些子类型可以定义组件的类型、结构和用途。所有智能体的核心是其配置文件和角色;在此基础上,系统和功能不断扩展,以增强智能体的功能。
Figure 1.3 describes the major categories of components a single-agent system may incorporate. Each element will have subtypes that can define the component’s type, structure, and use. At the core of all agents is the profile and persona; extending from that are the systems and functions that enhance the agent.
图 1.4 所示的智能体概况和角色信息代表了智能体的基本描述。角色信息(通常称为系统提示) 指导智能体完成任务、学习如何响应以及其他细微差别。它包含背景(例如,程序员、作家)和人口统计信息等元素,并且可以通过手工设计、LLM 辅助或数据驱动技术(包括进化算法)等方法生成。
The agent profile and persona shown in figure 1.4 represent the base description of the agent. The persona—often called the system prompt —guides an agent to complete tasks, learn how to respond, and other nuances. It includes elements such as the background (e.g., coder, writer) and demographics, and it can be generated through methods such as handcrafting, LLM assistance, or data-driven techniques, including evolutionary algorithms.
我们将探讨如何运用诸如评估标准和基础分析等技巧,创建有效且具体的代理档案/角色画像。此外,我们还将阐述人工构建的代理档案与人工智能构建的代理档案(LLM)之间的区别,包括利用数据和进化算法构建代理档案的创新技术。
We’ll explore how to create effective and specific agent profiles/personas through techniques such as rubrics and grounding. In addition, we’ll explain the aspects of human-formulated versus AI-formulated (LLM) profiles, including innovative techniques using data and evolutionary algorithms to build profiles.
注意: 代理人或助理的个人资料由多个要素组成,包括个人特质。您可以将个人资料理解为描述代理人/助理将要执行的工作以及所需的工具。
Note The agent or assistant profile is composed of elements, including the persona. It may be helpful to think of profiles describing the work the agent/ assistant will perform and the tools it needs.
图 1.5 展示了智能体在完成任务或获取信息的过程中,其组成动作和工具的使用情况。这些动作可分为任务完成、探索和通信三大类,它们对智能体的环境和内部状态的影响程度各不相同。动作可以手动生成、通过记忆回忆生成,也可以按照预定义的计划生成,这些动作会影响智能体的行为并增强其学习能力。
Figure 1.5 demonstrates the component actions and tool use in the context of agents involving activities directed toward task completion or acquiring information. These actions can be categorized into task completion, exploration, and communication, with varying levels of effect on the agent’s environment and internal states. Actions can be generated manually, through memory recollection, or by following predefined plans, influencing the agent’s behavior and enhancing learning.
理解行动目标有助于我们明确任务完成、探索或沟通的目标。认识到行动效果则揭示了行动如何影响任务结果、智能体的环境及其内部状态,从而有助于高效决策。最后,掌握行动生成方法使我们能够手动创建行动、从记忆中回忆行动或遵循预定义的计划,从而增强我们有效塑造智能体行为和学习过程的能力。
Understanding the action target helps us define clear objectives for task completion, exploration, or communication. Recognizing the action effect reveals how actions influence task outcomes, the agent’s environment, and its internal states, contributing to efficient decision making. Lastly, grasping action generation methods equips us with the knowledge to create actions manually, recall them from memory, or follow predefined plans, enhancing our ability to effectively shape agent behavior and learning processes.
图 1.6 更详细地展示了知识和记忆的组成。智能体利用知识和记忆,在限制词元数量的同时,用最相关的信息标注上下文。知识和记忆结构可以统一,两者可以遵循单一结构,也可以采用混合结构,即结合不同的检索方式。知识和记忆格式可以非常多样化,从语言(例如 PDF 文档)到数据库(关系型、对象型或文档型)以及嵌入,通过向量表示甚至简单的列表作为智能体的记忆,简化语义相似性搜索。
Figure 1.6 shows the component knowledge and memory in more detail. Agents use knowledge and memory to annotate context with the most pertinent information while limiting the number of tokens used. Knowledge and memory structures can be unified, where both subsets follow a single structure or hybrid structure involving a mix of different retrieval forms. Knowledge and memory formats can vary widely from language (e.g., PDF documents) to databases (relational, object, or document) and embeddings, simplifying semantic similarity search through vector representations or even simple lists serving as agent memories.
图 1.7 展示了智能体系统的推理和评估组件。研究和实际应用表明,LLM/智能体能够有效地进行推理。推理和评估系统通过赋予智能体思考问题和评估解决方案的能力,来标注智能体的工作流程。
Figure 1.7 shows the reasoning and evaluation component of an agent system. Research and practical applications have shown that LLMs/agents can effectively reason. Reasoning and evaluation systems annotate an agent’s workflow by providing an ability to think through problems and evaluate solutions.
图 1.8 展示了组件代理的规划/反馈及其在组织任务以实现更高层次目标中的作用。它可以分为以下两种方法:
Figure 1.8 shows the component agent planning/feedback and its role in organizing tasks to achieve higher-level goals. It can be categorized into these two approaches:
在规划过程中,智能体可以采用单路径推理、按步骤顺序推理,或者采用多路径推理来探索多种策略,并将高效的策略保存下来以供将来使用。外部规划器(可以是代码或其他智能体系统)也可以在规划过程中发挥作用。
Within planning, agents may employ single-path reasoning, sequential reasoning through each step of a task, or multipath reasoning to explore multiple strategies and save the efficient ones for future use. External planners, which can be code or other agent systems, may also play a role in orchestrating plans.
我们之前提到的任何一种代理类型——代理/助手、代理/助手或自主代理——都可以使用这些组件中的部分或全部。即使是规划组件,在自主代理之外也发挥着作用,并且可以有效地增强普通代理的功能。
Any of our previous agent types—the proxy agent/assistant, agent/assistant, or autonomous agent—may use some or all of these components. Even the planning component has a role outside of the autonomous agent and can effectively empower even the regular agent.
人工智能代理和助手已迅速从人工智能研究的主要领域发展成为主流软件开发。越来越多的工具和平台助力构建和增强代理的功能。在局外人看来,这一切或许只是炒作,旨在抬高某些炫酷但被过分吹捧的技术的价值。
AI agents and assistants have quickly moved from the main commodity in AI research to mainstream software development. An ever-growing list of tools and platforms assist in the construction and empowerment of agents. To an outsider, it may all seem like hype intended to inflate the value of some cool but overrated technology.
在 ChatGPT 最初发布后的几个月里,一种名为“提示工程”的新领域应运而生:用户发现,通过在提示中使用各种技巧和模式,可以生成更好、更稳定的输出。然而,用户也意识到,提示工程的局限性也显而易见。
During the first few months after ChatGPT’s initial release, a new discipline called prompt engineering was formed: users found that using various techniques and patterns in their prompts allowed them to generate better and more consistent output. However, users also realized that prompt engineering could only go so far.
提示工程仍然是与 ChatGPT 等语言学习模型 (LLM) 直接交互的绝佳方式。随着时间的推移,许多用户发现,有效的提示需要迭代、反思以及更多的迭代。AutoGPT 等首批智能体系统正是在这些发现的基础上应运而生,并迅速引起了社区的关注。
Prompt engineering is still an excellent way to interact directly with LLMs such as ChatGPT. Over time, many users discovered that effective prompting required iteration, reflection, and more iteration. The first agent systems, such as AutoGPT, emerged from these discoveries, capturing the community’s attention.
图 1.9 展示了 AutoGPT 的原始设计,它是最早的自主代理系统之一。该代理的设计目标是迭代执行一系列预先规划好的任务,这些任务是根据用户的目标制定的。在每次任务迭代过程中,代理都会评估目标并判断任务是否完成。如果任务未完成,代理可能会根据新的知识或用户反馈重新规划步骤并更新计划。
Figure 1.9 shows the original design of AutoGPT, one of the first autonomous agent systems. The agent is designed to iterate a planned sequence of tasks that it defines by looking at the user’s goal. Through each task iteration of steps, the agent evaluates the goal and determines if the task is complete. If the task isn’t complete, the agent may replan the steps and update the plan based on new knowledge or human feedback.
AutoGPT 是首个展示任务规划和迭代在 LLM 模型中强大作用的案例。在此基础上,其他智能体系统和框架如雨后春笋般涌现,它们也采用了类似的规划和任务迭代机制。人们普遍认为,规划、迭代和重复是解决 LLM 复杂且多方面目标的最佳方法。
AutoGPT became the first example to demonstrate the power of using task planning and iteration with LLM models. From this and in tandem, other agent systems and frameworks exploded into the community using similar planning and task iteration systems. It’s generally accepted that planning, iteration, and repetition are the best processes for solving complex and multifaceted goals for an LLM.
然而,自主代理系统需要对代理的决策过程、防护机制/评估系统以及目标定义建立信任。信任也是需要时间积累的。我们缺乏信任源于我们对自主代理能力的理解不足。
However, autonomous agent systems require trust in the agent decision-making process, the guardrails/evaluation system, and the goal definition. Trust is also something that is acquired over time. Our lack of trust stems from our lack of understanding of an autonomous agent’s capabilities.
通用 人工智能(AGI)是一种能够学习完成人类所能完成的任何任务的智能形式。许多人工智能领域的从业者认为,利用自主代理系统实现通用人工智能是一个可以实现的目标。
Note Artificial general intelligence (AGI) is a form of intelligence that can learn to accomplish any task a human can. Many practitioners in this new world of AI believe an AGI using autonomous agent systems is an attainable goal.
因此,许多主流且可用于生产环境的智能体工具并非自主的。然而,它们在利用广义物理理论(GPT)和逻辑学习模型(LLM)管理和自动化任务方面仍然具有显著优势。鉴于本书的目标是理解所有类型的智能体,许多实际应用仍将由非自主智能体驱动。
For this reason, many of the mainstream and production-ready agent tools aren’t autonomous. However, they still provide a significant benefit in managing and automating tasks using GPTs (LLMs). Therefore, as our goal in this book is to understand all agent forms, many more practical applications will be driven by non-autonomous agents.
代理和代理工具只是新型软件应用开发范式的顶层。我们将在下一节中探讨这种新范式。
Agents and agent tools are only the top layer of a new software application development paradigm. We’ll look at this new paradigm in the next section.
人工智能代理范式不仅改变了我们与生命周期管理(LLM)的协作方式,也被视为我们开发软件和处理数据方式的变革。软件和数据将不再使用用户界面(UI)、应用程序编程接口(API)和诸如SQL之类的专用查询语言进行交互,而是采用自然语言进行交互。
The AI agent paradigm is not only a shift in how we work with LLMs but is also perceived as a shift in how we develop software and handle data. Software and data will no longer be interfaced using user interfaces (UIs), application programming interfaces (APIs), and specialized query languages such as SQL. Instead, they will be designed to be interfaced using natural language.
图 1.10 展示了这种新架构的概览,以及人工智能代理在其中扮演的角色。数据、软件和应用程序都将进行调整,以支持语义化的自然语言界面。这些人工智能界面允许代理收集数据并与软件应用程序交互,甚至可以与其他代理或代理应用程序交互。这标志着我们与软件和应用程序交互方式的一次全新转变。
Figure 1.10 shows a high-level snapshot of what this new architecture may look like and what role AI agents play. Data, software, and applications adapt to support semantic, natural language interfaces. These AI interfaces allow agents to collect data and interact with software applications, even other agents or agent applications. This represents a new shift in how we interact with software and applications.
人工智能界面是一系列功能、工具和数据层的集合,它通过自然语言展现数据和应用程序。过去,“语义”一词被广泛用于描述这类界面,甚至一些工具也使用这个名称;然而,“语义”一词本身也包含多种含义和用法。因此,本书将使用“人工智能界面”这一术语。
An AI interface is a collection of functions, tools, and data layers that expose data and applications by natural language. In the past, the word semantic has been heavily used to describe these interfaces, and even some tools use the name; however, “semantic” can also have a variety of meanings and uses. Therefore, in this book, we’ll use the term AI interface.
人工智能界面的构建将赋予需要使用服务、工具和数据的代理更多能力。这种能力的提升将带来任务完成的准确性提高,以及更值得信赖和更自主的应用程序。虽然人工智能界面可能并不适用于所有软件和数据,但它将在许多应用场景中占据主导地位。
The construction of AI interfaces will empower agents that need to consume the services, tools, and data. With this empowerment will come increased accuracy in completing tasks and more trustworthy and autonomous applications. While an AI interface may not be appropriate for all software and data, it will dominate many use cases.
GPT智能体代表着消费者和开发者处理一切事务方式的彻底转变,从信息查找到软件构建再到数据访问,无所不包。几乎每天都有新的智能体框架、组件或界面出现在GitHub或研究论文中。这对于试图理解智能体系统及其使用方法的新用户来说,可能会让人感到不知所措,甚至望而却步。
GPT agents represent an entire shift in how consumers and developers approach everything, from finding information to building software and accessing data. Almost daily, a new agent framework, component, or interface pops up on GitHub or in a research paper. This can be overwhelming and intimidating to the new user trying to grasp what agent systems are and how to use them.
大型语言模型(LLM)这一术语如今,LLM 已成为人工智能的一种普遍描述形式。这些 LLM 是使用生成式预训练 Transformer (GPT) 模型开发的。虽然其他架构也支持 LLM,但 GPT 架构目前是最成功的。
The term large language models (LLMs) has now become a ubiquitous descriptor of a form of AI. These LLMs have been developed using generative pretrained transformers (GPTs). While other architectures also power LLMs, the GPT form is currently the most successful.
LLM 和 GPT 都是生成模型,这意味着它们的训练目的是生成内容,而不是预测或分类内容。为了进一步说明这一点,请参考图 2.1,该图展示了生成模型与预测/分类模型之间的区别。生成模型根据输入生成内容,而预测模型和分类模型则对输入内容进行分类。
LLMs and GPTs are generative models, which means they are trained to generate rather than predict or classify content. To illustrate this further, consider figure 2.1, which shows the difference between generative and predictive/classification models. Generative models create something from the input, whereas predictive and classifying models classify it.
我们可以进一步根据其组成部分来定义LLM,如图2.2所示。在该图中,数据代表用于训练模型的内容,而架构是模型本身的属性,例如参数数量或模型大小。模型还会针对所需的用例进行专门训练,例如聊天、自动补全或指令。最后,微调是添加到模型中的一项功能,它可以优化输入数据和模型训练,使其更好地匹配特定的用例或领域。
We can further define an LLM by its constituent parts, as shown in figure 2.2. In this diagram, data represents the content used to train the model, and architecture is an attribute of the model itself, such as the number of parameters or size of the model. Models are further trained specifically to the desired use case, including chat, completions, or instruction. Finally, fine-tuning is a feature added to models that refines the input data and model training to better match a particular use case or domain.
GPT 的 Transformer 架构是 LLM 的一种特定架构,它允许模型扩展到数十亿参数的规模。这需要使用 TB 级的文档来训练这些大型模型,以构建其基础。之后,这些模型将使用各种方法进行后续训练,以适应模型的预期应用场景。
The transformer architecture of GPTs, which is a specific architecture of LLMs, allows the models to be scaled to billions of parameters in size. This requires these large models to be trained on terabytes of documents to build a foundation. From there, these models will be successively trained using various methods for the desired use case of the model.
例如,ChatGPT模型首先在公共互联网上进行有效训练,然后使用多种训练策略进行微调。最终的微调训练采用了一种称为“带人工反馈的强化学习”(RLHF)的高级方法。由此产生了一个名为“聊天补全”的模型用例。
ChatGPT, for example, is trained effectively on the public internet and then fine-tuned using several training strategies. The final fine-tuning training is completed using an advanced form called reinforcement learning with human feedback (RLHF). This produces a model use case called chat completions.
聊天补全逻辑学习模型(LLM)旨在通过迭代和改进来提升性能——换句话说,就是通过聊天来提升性能。这些模型在任务完成、推理和规划方面也经过基准测试,表现优异,因此非常适合构建智能体和助手。补全模型仅用于根据输入文本生成内容,因此无法从迭代中获益。
Chat completions LLMs are designed to improve through iteration and refinement—in other words, chatting. These models have also been benchmarked to be the best in task completion, reasoning, and planning, which makes them ideal for building agents and assistants. Completion models are trained/designed only to provide generated content on input text, so they don’t benefit from iteration.
本书中,我们将重点介绍构建强大智能体的 LLM(逻辑逻辑模型)类别,即聊天补全模型。当然,这并不意味着您不能尝试其他模型形式。但是,您可能需要对提供的代码示例进行大幅修改才能支持其他模型形式。
For our journey to build powerful agents in this book, we focus on the class of LLMs called chat completions models. That, of course, doesn’t preclude you from trying other model forms for your agents. However, you may have to significantly alter the code samples provided to support other model forms.
本章稍后将详细介绍 LLM 和 GPT,届时我们将探讨如何在本地运行开源 LLM。下一节,我们将介绍如何使用 OpenAI 正在发展中的标准连接到 LLM。
We’ll uncover more details about LLMs and GPTs later in this chapter when we look at running an open source LLM locally. In the next section, we look at how to connect to an LLM using a growing standard from OpenAI.
许多人工智能代理和助手项目都使用 OpenAI API SDK 连接到 LLM(逻辑逻辑模型)。虽然这并非标准做法,但描述连接的基本概念现在遵循 OpenAI 的模式。因此,我们必须理解使用 OpenAI SDK 连接 LLM 的核心概念。
Numerous AI agents and assistant projects use the OpenAI API SDK to connect to an LLM. While not standard, the basic concepts describing a connection now follow the OpenAI pattern. Therefore, we must understand the core concepts of an LLM connection using the OpenAI SDK.
本章将探讨如何使用 OpenAI Python SDK/软件包连接到 LLM 模型。我们将讨论如何连接到 GPT-4 模型、模型响应、词元计数以及如何定义一致性消息。从以下小节开始,我们将研究如何使用 OpenAI。
This chapter will look at connecting to an LLM model using the OpenAI Python SDK/package. We’ll discuss connecting to a GPT-4 model, the model response, counting tokens, and how to define consistent messages. Starting in the following subsection, we’ll examine how to use OpenAI.
要完成本节及后续章节的练习,您必须搭建 Python 开发环境并获取 LLM 的访问权限。附录 A 将指导您如何设置 OpenAI 帐户并访问 GPT-4 或其他模型。附录 B 演示了如何使用 Visual Studio Code (VS Code) 设置 Python 开发环境,包括安装所需的扩展。如果您想跟随练习场景进行操作,请阅读这些章节。
To complete the exercises in this section and subsequent ones, you must set up a Python developer environment and get access to an LLM. Appendix A walks you through setting up an OpenAI account and accessing GPT-4 or other models. Appendix B demonstrates setting up a Python development environment with Visual Studio Code (VS Code), including installing needed extensions. Review these sections if you want to follow along with the scenarios.
首先在 VS Code 中打开源代码chapter_2文件夹,并创建一个新的 Python 虚拟环境。如果需要帮助,请参阅附录 B。
Start by opening the source code chapter_2 folder in VS Code and creating a new Python virtual environment. Again, refer to appendix B if you need assistance.
然后,使用以下命令安装 OpenAI 和 Python 的 .环境包。这将把所需的软件包安装到虚拟环境中。
Then, install the OpenAI and Python dot environment packages using the command in the following listing. This will install the required packages into the virtual environment.
pip安装pip install openai python-dotenv
pip install openai python-dotenv
接下来,在 VS Code 中打开该connecting.py文件,并检查清单 2.2 中所示的代码。务必将模型名称设置为合适的名称,例如 gpt-4。在撰写本文时,该名称gpt-4-1106-preview用于表示 GPT-4 Turbo。
Next, open the connecting.py file in VS Code, and inspect the code shown in listing 2.2. Be sure to set the model’s name to an appropriate name—for example, gpt-4. At the time of writing, the gpt-4-1106-preview was used to represent GPT-4 Turbo.
connecting.py 导入操作系统
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv() #1
api_key = os.getenv('OPENAI_API_KEY')
如果没有 api_key: #2
raise ValueError("未找到 API 密钥。请检查您的 .env 文件。")
client = OpenAI(api_key=api_key) #3
def ask_chatgpt(user_message):
response = client.chat.completions.create( #4
model="gpt-4-1106-preview",
messages=[{"role": "system",
"content": "你是一位乐于助人的助手。"
{"角色": "用户", "内容": 用户消息}],
温度=0.7,
)
返回 response.choices[0].message.content #5
用户:“法国的首都是哪里?”
response = ask_chatgpt(user) #6
print(响应)import os
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv() #1
api_key = os.getenv('OPENAI_API_KEY')
if not api_key: #2
raise ValueError("No API key found. Please check your .env file.")
client = OpenAI(api_key=api_key) #3
def ask_chatgpt(user_message):
response = client.chat.completions.create( #4
model="gpt-4-1106-preview",
messages=[{"role": "system",
"content": "You are a helpful assistant."},
{"role": "user", "content": user_message}],
temperature=0.7,
)
return response.choices[0].message.content #5
user = "What is the capital of France?"
response = ask_chatgpt(user) #6
print(response)
这里涉及的内容很多,所以我们分部分来讲解,从开头加载环境变量开始。chapter_2文件夹中还有一个名为 `<filename>` 的文件.env,其中保存着环境变量。这些变量会通过调用 `<function>` 函数自动设置load_dotenv。
A lot is happening here, so let’s break it down by section, starting with the beginning and loading the environment variables. In the chapter_2 folder is another file called .env, which holds environment variables. These variables are set automatically by calling the load_dotenv function.
您必须在.env文件中设置您的 OpenAI API 密钥,如下表所示。同样,请参阅附录 A 以了解如何获取密钥和查找模型名称。
You must set your OpenAI API key in the .env file, as shown in the next listing. Again, refer to appendix A to find out how to get a key and find a model name.
.env OPENAI_API_KEY='your-openai-api-key'
OPENAI_API_KEY='your-openai-api-key'
设置密钥后,您可以按 F5 键或从 VS Code 菜单中选择“运行”>“开始调试”来调试文件。这将运行代码,您应该会看到类似“法国的首都是巴黎”的信息。
After setting the key, you can debug the file by pressing the F5 key or selecting Run > Start Debugging from the VS Code menu. This will run the code, and you should see something like “The capital of France is Paris.”
请记住,生成模型的响应取决于概率。在这种情况下,模型很可能会给出正确且一致的答案。
Remember that the response from a generative model depends on the probability. The model will probably give us a correct and consistent answer in this case.
您可以通过调整请求温度来控制这些概率。如果您希望模型更加稳定,请将温度调低至 0;如果您希望模型产生更多变化,请调高温度。我们将在下一节中进一步探讨如何设置温度。
You can play with these probabilities by adjusting the temperature of the request. If you want a model to be more consistent, turn the temperature down to 0, but if you want the model to produce more variation, turn the temperature up. We’ll explore setting the temperature further in the next section.
深入研究聊天完成请求和响应功能会很有帮助。我们首先关注请求,如下所示。请求包含了预期的模型、消息和温度。
Digging into the chat completions request and response features can be helpful. We’ll focus on the request first, as shown next. The request encapsulates the intended model, the messages, and the temperature.
response = client.chat.completions.create(
model="gpt-4-1106-preview", #1
messages=[{"role": "system",
"content": "你是一位乐于助人的助手。" #2
{"角色": "用户", "内容": 用户消息}], #3
温度=0.7, #4
)response = client.chat.completions.create(
model="gpt-4-1106-preview", #1
messages=[{"role": "system",
"content": "You are a helpful assistant."}, #2
{"role": "user", "content": user_message}], #3
temperature=0.7, #4
)
在请求中,该messages代码块描述了请求中使用的一组消息和角色。聊天完成模型中的消息可以定义为三种角色:
Within the request, the messages block describes a set of messages and roles used in a request. Messages for a chat completions model can be defined in three roles:
单个请求中发送的消息可以封装整个对话,如下面的 JSON 示例所示。
The message sent in a single request can encapsulate an entire conversation, as shown in the JSON in the following listing.
[
{
"角色": "系统",
"内容": "你是一位乐于助人的助手。"
},
{
"角色": "用户",
内容:法国的首都是哪里?
},
{
“角色”:“助理”,
内容:法国的首都是巴黎。
},
{
"角色": "用户",
内容:关于巴黎,有什么有趣的事实?
}
],[
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is the capital of France?"
},
{
"role": "assistant",
"content": "The capital of France is Paris."
},
{
"role": "user",
"content": "What is an interesting fact of Paris."
}
],
您可以通过在 VS Code 中打开并按 F5 进行调试来查看其应用方式message_history.py。文件运行后,请务必检查输出结果。然后,尝试多次运行示例,看看结果有何变化。
You can see how this can be applied by opening message_history.py in VS Code and debugging it by pressing F5. After the file runs, be sure to check the output. Then, try to run the sample a few more times to see how the results change.
由于温度较高,每次运行的结果都会有所不同.7。请将温度降低至.0,并重复运行message_history.py几次。保持温度在,0每次都会得到相同或相似的结果。
The results will change from each run to the next due to the high temperature of .7. Go ahead and reduce the temperature to .0, and run the message_history.py sample a few more times. Keeping the temperature at 0 will show the same or similar results each time.
设置请求的温度通常取决于您的具体使用场景。有时,您可能希望限制响应的随机性。将温度降低到某个值0可以获得一致的结果。同样,1.0将温度设置为某个值会使响应具有最大的可变性。
Setting a request’s temperature will often depend on your particular use case. Sometimes, you may want to limit the responses’ stochastic nature (randomness). Reducing the temperature to 0 will give consistent results. Likewise, a value of 1.0 will give the most variability in the responses.
接下来,我们还想知道每个请求返回了哪些信息。下面的清单显示了响应的输出格式。您可以通过message_history.py在 VS Code 中运行该文件来查看此输出。
Next, we also want to know what information is being returned for each request. The next listing shows the output format for the response. You can see this output by running the message_history.py file in VS Code.
{
“id”: “chatcmpl-8WWL23up3IRfK1nrDFQ3EHQfhx0U6”
“选择”:[ #1
{
"finish_reason": "停止",
"index": 0,
“信息”: {
"内容": "…省略",
“角色”:“助理”, #2
"function_call": null,
"tool_calls": null
},
"logprobs": null
}
],
“创建”:1702761496,
“模型”: “gpt-4-1106-preview” #3
"object": "chat.completion",
"system_fingerprint": "fp_3905aa4f79",
“用法”: {
"completion_tokens": 78, #4
"prompt_tokens": 48,#4
"total_tokens": 126 #4
}
}{
"id": "chatcmpl-8WWL23up3IRfK1nrDFQ3EHQfhx0U6",
"choices": [ #1
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "… omitted",
"role": "assistant", #2
"function_call": null,
"tool_calls": null
},
"logprobs": null
}
],
"created": 1702761496,
"model": "gpt-4-1106-preview", #3
"object": "chat.completion",
"system_fingerprint": "fp_3905aa4f79",
"usage": {
"completion_tokens": 78, #4
"prompt_tokens": 48, #4
"total_tokens": 126 #4
}
}
跟踪输入令牌(提示中使用的令牌)和输出令牌(通过补全返回的令牌)的数量很有帮助。有时,尽量减少令牌数量至关重要。令牌越少,LLM 交互的成本通常越低,响应速度越快,并且能够产生更好、更一致的结果。
It can be helpful to track the number of input tokens (those used in prompts) and the output tokens (the number returned through completions). Sometimes, minimizing and reducing the number of tokens can be essential. Having fewer tokens typically means LLM interactions will be cheaper, respond faster, and produce better and more consistent results.
以上内容涵盖了连接到 LLM 并返回响应的基础知识。本书后续章节将回顾并详细讲解如何与 LLM 进行交互。在此之前,下一节我们将探讨如何加载和使用开源 LLM。
That covers the basics of connecting to an LLM and returning responses. Throughout this book, we’ll review and expand on how to interact with LLMs. Until then, we’ll explore in the next section how to load and use open source LLMs.
商业化的机器学习模型(例如 OpenAI 的 GPT-4)是学习如何使用现代人工智能和构建智能体的绝佳起点。然而,商业智能体作为一种外部资源,需要付费,会降低数据隐私和安全,并引入依赖关系。其他外部因素也会使这些问题更加复杂。
Commercial LLMs, such as GPT-4 from OpenAI, are an excellent place to start to learn how to use modern AI and build agents. However, commercial agents are an external resource that comes at a cost, reduces data privacy and security, and introduces dependencies. Other external influences can further complicate these factors.
毫不奇怪,构建可比拟的开源LLM的竞争日益激烈。因此,现在已经出现了许多足以应对各种任务和代理系统的开源LLM。短短一年内,工具方面就取得了如此巨大的进步,以至于本地部署LLM变得非常容易,我们将在下一节中看到这一点。
It’s unsurprising that the race to build comparable open source LLMs is growing more competitive every day. As a result, there are now open source LLMs that may be adequate for numerous tasks and agent systems. There have even been so many advances in tooling in just a year that hosting LLMs locally is now very easy, as we’ll see in the next section.
LM Studio 是一款免费下载的软件,支持在 Windows、Mac 和 Linux 系统上下载并本地托管 LLM 和其他模型。该软件易于使用,并提供多种实用功能,可帮助您快速上手。以下是下载和安装 LM Studio 的简要步骤:
LM Studio is a free download that supports downloading and hosting LLMs and other models locally for Windows, Mac, and Linux. The software is easy to use and offers several helpful features to get you started quickly. Here is a quick summary of steps to download and set up LM Studio:
图 2.3 显示了正在运行的 LM Studio 窗口。您可以在此查看当前热门车型列表、搜索其他车型,甚至下载车型。首页内容有助于您了解热门车型的详细信息和规格。
Figure 2.3 shows the LM Studio window running. From there, you can review the current list of hot models, search for others, and even download. The home page content can be handy for understanding the details and specifications of the top models.
LM Studio 的一个吸引人的特点是它能够分析您的硬件并将其与特定模型的要求相匹配。该软件会告诉您运行特定模型的性能如何。这可以极大地节省时间,指导您选择合适的模型进行实验。
An appealing feature of LM Studio is its ability to analyze your hardware and align it with the requirements of a given model. The software will let you know how well you can run a given model. This can be a great time saver in guiding what models you experiment with.
输入一些文本来搜索模型,然后点击“搜索”。您将被带到搜索页面界面,如图 2.4 所示。在此页面上,您可以查看所有模型变体和其他规格,例如上下文标记大小。点击“兼容性猜测”按钮后,软件甚至会告诉您该模型是否能在您的系统上运行。
Enter some text to search for a model, and click Go. You’ll be taken to the search page interface, as shown in figure 2.4. From this page, you can see all the model variations and other specifications, such as context token size. After you click the Compatibility Guess button, the software will even tell you if the model will run on your system.
点击即可下载任何适用于您系统运行的模型。您可能需要使用专为聊天自动补全设计的模型,但如果您的系统配置有限,请使用现有模型。此外,如果您不确定使用哪个模型,不妨下载试用。LM Studio 是探索和体验各种模型的绝佳工具。
Click to download any model that will run on your system. You may want to stick with models designed for chat completions, but if your system is limited, work with what you have. In addition, if you’re unsure of which model to use, go ahead and download to try them. LM Studio is a great way to explore and experiment with many models.
模型下载完成后,您可以在聊天页面加载并运行该模型,也可以在服务器页面将其作为服务器运行。图 2.5 展示了如何在聊天页面加载和运行模型。图中还显示了启用和使用 GPU 的选项(如果您有 GPU)。
After the model is downloaded, you can then load and run the model on the chat page or as a server on the server page. Figure 2.5 shows loading and running a model on the chat page. It also shows the option for enabling and using a GPU if you have one.
要加载并运行模型,请打开页面顶部中间的下拉菜单,然后选择已下载的模型。页面将显示进度条,表示模型正在加载;加载完成后,即可开始在用户界面中输入内容。
To load and run a model, open the drop-down menu at the top middle of the page, and select a downloaded model. A progress bar will appear showing the model loading, and when it’s ready, you can start typing into the UI.
该软件甚至允许您在检测到GPU的情况下,使用部分或全部GPU进行模型推理。GPU通常可以在一定程度上加快模型的响应速度。您可以通过查看页面底部的性能状态(如图2.5所示)来了解添加GPU如何影响模型的性能。
The software even allows you to use some or all of your GPU, if detected, for the model inference. A GPU will generally speed up the model response times in some capacities. You can see how adding a GPU can affect the model’s performance by looking at the performance status at the bottom of the page, as shown in figure 2.5.
与模型进行对话,并使用或尝试各种提示,可以帮助您确定模型在特定用例中的适用性。更系统的方法是使用提示流工具来评估提示和学习生命周期模型 (LLM)。我们将在第 9 章中介绍如何使用提示流工具。
Chatting with a model and using or playing with various prompts can help you determine how well a model will work for your given use case. A more systematic approach is using the prompt flow tool for evaluating prompts and LLMs. We’ll describe how to use prompt flow in chapter 9.
LM Studio 还允许在服务器上运行模型,并使用 OpenAI 软件包使其可访问。我们将在下一节中了解如何使用服务器功能并部署模型。
LM Studio also allows a model to be run on a server and made accessible using the OpenAI package. We’ll see how to use the server feature and serve a model in the next section.
使用 LM Studio 在本地运行 LLM 服务器非常简单。只需打开服务器页面,加载模型,然后单击“启动服务器”按钮,如图 2.6 所示。之后,您可以复制并粘贴任何示例代码来连接到您的模型。
Running an LLM locally as a server is easy with LM Studio. Just open the server page, load a model, and then click the Start Server button, as shown in figure 2.6. From there, you can copy and paste any of the examples to connect with your model.
您可以通过在 VS Code 中打开来查看 Python 代码示例chapter_2/lmstudio_ server.py。代码也显示在清单 2.7 中。然后,在 VS Code 调试器中运行代码(按 F5)。
You can review an example of the Python code by opening chapter_2/lmstudio_ server.py in VS Code. The code is also shown here in listing 2.7. Then, run the code in the VS Code debugger (press F5).
lmstudio_server.py from openai import OpenAI
client = OpenAI(base_url="http://localhost:1234/v1", api_key="不需要")
completion = client.chat.completions.create(
model="local-model", #1
messages=[
{"角色": "系统", "内容": "请用韵文回答。"},
{"role": "用户", "content": "自我介绍。"} #2
],
温度=0.7,
)
print(completion.choices[0].message) #3from openai import OpenAI
client = OpenAI(base_url="http://localhost:1234/v1", api_key="not-needed")
completion = client.chat.completions.create(
model="local-model", #1
messages=[
{"role": "system", "content": "Always answer in rhymes."},
{"role": "user", "content": "Introduce yourself."} #2
],
temperature=0.7,
)
print(completion.choices[0].message) #3
如果您在连接服务器时遇到问题或遇到任何其他问题,请确保您的服务器模型设置配置与模型类型匹配。例如,在前面所示的图 2.6 中,加载的模型与服务器设置不符。更正后的设置如图 2.7 所示。
If you encounter problems connecting to the server or experience any other problems, be sure your configuration for the Server Model Settings matches the model type. For example, in figure 2.6, shown earlier, the loaded model differs from the server settings. The corrected settings are shown in figure 2.7.
现在,您可以使用本地托管的 LLM 或商业模型来构建、测试甚至运行您的代理。以下部分将探讨如何更有效地使用提示工程来构建提示。
Now, you can use a locally hosted LLM or a commercial model to build, test, and potentially even run your agents. The following section will examine how to build prompts using prompt engineering more effectively.
为LLM(生命周期管理)定义的提示是指用于请求更佳响应输出的消息内容。提示工程是一个新兴领域,旨在构建一套用于创建提示的方法论。遗憾的是,提示构建并非一门成熟的科学,目前存在着越来越多、越来越多样化的提示工程方法。
A prompt defined for LLMs is the message content used in the request for better response output. Prompt engineering is a new and emerging field that attempts to structure a methodology for building prompts. Unfortunately, prompt building isn’t a well-established science, and there is a growing and diverse set of methods defined as prompt engineering.
幸运的是,像 OpenAI 这样的组织已经开始记录一套通用的策略,如图 2.8 所示。这些策略涵盖了各种战术,其中一些需要额外的基础设施和考量因素。因此,与更高级概念相关的快速工程策略将在相应的章节中进行介绍。
Fortunately, organizations such as OpenAI have begun documenting a universal set of strategies, as shown in figure 2.8. These strategies cover various tactics, some requiring additional infrastructure and considerations. As such, the prompt engineering strategies relating to more advanced concepts will be covered in the indicated chapters.
图 2.8 中的每项策略都可以展开为具体的战术,从而进一步完善提示设计的具体方法。本章将探讨基本的“编写清晰指令”策略。图 2.9 更详细地展示了该策略的各种战术,并为每个战术提供了示例。在接下来的章节中,我们将通过代码演示来运行这些示例。
Each strategy in figure 2.8 unfolds into tactics that can further refine the specific method of prompt engineering. This chapter will examine the fundamental Write Clear Instructions strategy. Figure 2.9 shows the tactics for this strategy in more detail, along with examples for each tactic. We’ll look at running these examples using a code demo in the following sections.
“清晰指令”策略的关键在于谨慎且具体地提出要求。要求LLM执行某项任务与要求普通人完成同一任务并无本质区别。通常来说,请求中提供的任务相关信息和背景信息越详细,得到的回复就越好。
The Write Clear Instructions strategy is about being careful and specific about what you ask for. Asking an LLM to perform a task is no different from asking a person to complete the same task. Generally, the more information and context relevant to a task you can specify in a request, the better the response.
该策略已分解为您可以应用于提示的具体技巧。为了帮助您了解如何使用这些技巧,源代码文件夹prompt_engineering.py中提供了一个包含各种提示示例的代码演示() 。chapter 2
This strategy has been broken down into specific tactics you can apply to prompts. To understand how to use those, a code demo (prompt_engineering.py) with various prompt examples is in the chapter 2 source code folder.
prompt_engineering.py如清单 2.8 所示,在 VS Code 中打开文件。这段代码首先加载prompts文件夹中的所有 JSON Lines 文件。然后,它将文件列表显示为选项,并允许用户选择一个提示。选择选项后,提示将被提交到 LLM,并打印响应。
Open the prompt_engineering.py file in VS Code, as shown in listing 2.8. This code starts by loading all the JSON Lines files in the prompts folder. Then, it displays the list of files as choices and allows the user to select a prompt option. After selecting the option, the prompts are submitted to an LLM, and the response is printed.
prompt_engineering.py (main())def main():
目录 = "提示"
text_files = list_text_files_in_directory(directory) #1
如果没有 text_files:
print("目录中未找到文本文件。")
返回
def print_available(): #2
print("可用的提示策略:")
for i, filename in enumerate(text_files, start=1):
print(f"{i}.{filename}")
当 True 时:
尝试:
print_available() #2
choice = int(input("输入… 0 退出:")) #3
如果 choice == 0:
休息
elif 1 <= choice <= len(text_files):
selected_file = text_files[choice - 1]
file_path = os.path.join(directory,
选定文件)
提示 =
↪ load_and_parse_json_file(file_path) #4
print(f"正在运行 {selected_file} 的提示")
for i, prompt in enumerate(prompts):
print(f"提示 {i+1} --------------------")
print(提示)
print(f"回复 ---------------------------")
print(prompt_llm(prompt)) #5
别的:
print("选择无效。请输入有效数字。")
异常 ValueError:
print("输入无效。请输入一个数字。")def main():
directory = "prompts"
text_files = list_text_files_in_directory(directory) #1
if not text_files:
print("No text files found in the directory.")
return
def print_available(): #2
print("Available prompt tactics:")
for i, filename in enumerate(text_files, start=1):
print(f"{i}. {filename}")
while True:
try:
print_available() #2
choice = int(input("Enter … 0 to exit): ")) #3
if choice == 0:
break
elif 1 <= choice <= len(text_files):
selected_file = text_files[choice - 1]
file_path = os.path.join(directory,
selected_file)
prompts =
↪ load_and_parse_json_file(file_path) #4
print(f"Running prompts for {selected_file}")
for i, prompt in enumerate(prompts):
print(f"PROMPT {i+1} --------------------")
print(prompt)
print(f"REPLY ---------------------------")
print(prompt_llm(prompt)) #5
else:
print("Invalid choice. Please enter a valid number.")
except ValueError:
print("Invalid input. Please enter a number.")
示例中被注释掉的部分演示了如何连接到本地 LLM。这将使您能够探索适用于本地运行的开源 LLM 的相同提示工程策略。默认情况下,此示例使用我们在 2.1.1 节中配置的 OpenAI 模型。如果您之前没有完成配置,请在运行此示例之前返回并完成配置。
A commented-out section from the listing demonstrates how to connect to a local LLM. This will allow you to explore the same prompt engineering tactics applied to open source LLMs running locally. By default, this example uses the OpenAI model we configured previously in section 2.1.1. If you didn’t complete that earlier, please go back and do it before running this one.
图 2.10 显示了运行提示工程策略测试器(VS Code 中的文件)的输出结果prompt_engineering.py。运行测试器时,您可以输入要测试的策略值并观察其运行情况。
Figure 2.10 shows the output of running the prompt engineering tactics tester, the prompt_engineering.py file in VS Code. When you run the tester, you can enter a value for the tactic you want to test and watch it run.
在接下来的章节中,我们将更详细地探讨每一种提示策略,并分析各种示例。
In the following sections, we’ll explore each prompt tactic in more detail. We’ll also examine the various examples.
这种策略的基本前提是尽可能提供详细信息,但同时也要注意避免提供无关细节。以下列表展示了 JSON Lines 文件示例,供您探索这种策略。
The basic premise of this tactic is to provide as much detail as possible but also to be careful not to give irrelevant details. The following listing shows the JSON Lines file examples for exploring this tactic.
detailed_queries.jsonl [ #1
{
"角色": "系统",
"内容": "你是一位乐于助人的助手。"
},
{
"角色": "用户",
内容:“什么是代理人?” #2
}
]
[
{
"角色": "系统",
"内容": "你是一位乐于助人的助手。"
},
{
"角色": "用户",
“内容”: ”””
什么是GPT代理?
请举出三个GPT代理的例子。
""" #3
}
][ #1
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is an agent?" #2
}
]
[
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": """
What is a GPT Agent?
Please give me 3 examples of a GPT agent
""" #3
}
]
这个例子展示了使用详细查询和不使用详细查询的区别。它更进一步,要求提供示例。记住,你的提问越相关、越有背景信息,最终的回答就越好。要求提供示例是加强问题与预期结果之间联系的另一种方式。
This example demonstrates the difference between using detailed queries and not. It also goes a step further by asking for examples. Remember, the more relevance and context you can provide in your prompt, the better the overall response. Asking for examples is another way of enforcing the relationship between the question and the expected output.
采用用户画像可以让语言学习经理(LLM)定义一个总体背景或一套规则。LLM 随后可以利用这些背景和/或规则来构建所有后续的输出响应。这是一个非常有效的策略,我们将在本书中大量运用。
Adopting personas grants the ability to define an overarching context or set of rules to the LLM. The LLM can then use that context and/or rules to frame all later output responses. This is a compelling tactic and one that we’ll make heavy use of throughout this book.
清单 2.10 展示了使用两个角色来回答同一个问题的示例。这是一种有趣的技巧,可以探索各种新颖的应用场景,从获取人口统计反馈到专注于特定任务,甚至是模拟测试。
Listing 2.10 shows an example of employing two personas to answer the same question. This can be an enjoyable technique for exploring a wide range of novel applications, from getting demographic feedback to specializing in a specific task or even rubber ducking.
adopting_personas.jsonl [
{
"角色": "系统",
“内容”: ”””
你是一名20岁的女大学生。
在计算机科学领域。请回答所有问题。
初级程序员。
""" #1
},
{
"角色": "用户",
“内容”:“什么科目最好?”
}
]
[
{
"角色": "系统",
“内容”: ”””
您是一名38岁的男性注册护士。
请以专业医务人员的身份回答所有问题。
""" #2
},
{
"角色": "用户",
“内容”:“什么科目最好?”
}
][
{
"role": "system",
"content": """
You are a 20 year old female who attends college
in computer science. Answer all your replies as
a junior programmer.
""" #1
},
{
"role": "user",
"content": "What is the best subject to study."
}
]
[
{
"role": "system",
"content": """
You are a 38 year old male registered nurse.
Answer all replies as a medical professional.
""" #2
},
{
"role": "user",
"content": "What is the best subject to study."
}
]
代理档案的核心要素是角色。我们将采用不同的角色来帮助代理完成任务。在执行此策略时,请特别注意LLM输出响应的方式。
A core element of agent profiles is the persona. We’ll employ various personas to assist agents in completing their tasks. When you run this tactic, pay particular attention to the way the LLM outputs the response.
分隔符是一种有效的方法,可以隔离信息,使 LLM 专注于消息的特定部分。这种策略通常与其他策略结合使用,但单独使用也效果很好。以下列出了两个示例,但描述分隔符的方法还有很多,例如 XML 标签和 Markdown。
Delimiters are a useful way of isolating and getting the LLM to focus on some part of a message. This tactic is often combined with other tactics but can work well independently. The following listing demonstrates two examples, but there are several other ways of describing delimiters, from XML tags to using markdown.
using_delimiters.jsonl [
{
"角色": "系统",
“内容”: ”””
概括三引号内的内容
用一首俳句。
""" #1
},
{
"角色": "用户",
内容:“金链子很酷,但银链子更好。”
}
]
[
{
"角色": "系统",
“内容”: ”””
您将收到两份声明
(用 XML 标签分隔)关于同一主题。
首先概括每项陈述的论点。
然后指出哪一个说法更好。
并解释原因。
""" #2
},
{
"角色": "用户",
“内容”: ”””
金链子很酷。
<statement>银链更好</statement>
"""
}
][
{
"role": "system",
"content": """
Summarize the text delimited by triple quotes
with a haiku.
""" #1
},
{
"role": "user",
"content": "A gold chain is cool '''but a silver chain is better'''"
}
]
[
{
"role": "system",
"content": """
You will be provided with a pair of statements
(delimited with XML tags) about the same topic.
First summarize the arguments of each statement.
Then indicate which of them makes a better statement
and explain why.
""" #2
},
{
"role": "user",
"content": """
<statement>gold chains are cool</statement>
<statement>silver chains are better</statement>
"""
}
]
使用此策略时,请注意LLM在输出响应时重点关注的文本部分。此策略有助于描述层级结构或其他关系模式中的信息。
When you run this tactic, pay attention to the parts of the text the LLM focuses on when it outputs the response. This tactic can be beneficial for describing information in a hierarchy or other relationship patterns.
指定步骤是另一种强大的策略,用途广泛,包括在智能体中,如清单 2.12 所示。在为复杂的多步骤任务开发提示或智能体配置文件时,它尤其有效。您可以指定步骤,将这些复杂的提示分解成 LLM 可以遵循的逐步流程。反过来,这些步骤可以引导 LLM 在更长的对话和多次迭代中完成多次交互。
Specifying steps is another powerful tactic that can have many uses, including in agents, as shown in listing 2.12. It’s especially powerful when developing prompts or agent profiles for complex multistep tasks. You can specify steps to break down these complex prompts into a step-by-step process that the LLM can follow. In turn, these steps can guide the LLM through multiple interactions over a more extended conversation and many iterations.
specifying_steps.jsonl [
{
"角色": "系统",
“内容”: ”””
请按照以下步骤操作,以响应用户输入。
步骤 1 - 用户将以三个单引号括起来提供文本。
请用一句话概括这段文字,并在前面加上“概括:”的前缀。
第二步——将第一步的总结翻译成西班牙语。
带有“翻译:”前缀。
""" #1
},
{
"角色": "用户",
"content": "我饿了,想点一份开胃菜。"
}
]
[
{
"角色": "系统",
“内容”: ”””
请按照以下步骤操作,以响应用户输入。
步骤 1 - 用户将提供文本。请回答任何问题。
文本以一句话的形式呈现,并在前面加上“答案:”。
第二步——将第一步的答案翻译成一个老爸笑话,
前面加上“爸爸笑话:”的前缀。 #2
},
{
"角色": "用户",
内容:巴黎最高的建筑是什么?
}
][
{
"role": "system",
"content": """
Use the following step-by-step instructions to respond to user inputs.
Step 1 - The user will provide you with text in triple single quotes.
Summarize this text in one sentence with a prefix that says 'Summary: '.
Step 2 - Translate the summary from Step 1 into Spanish,
with a prefix that says 'Translation: '.
""" #1
},
{
"role": "user",
"content": "'''I am hungry and would like to order an appetizer.'''"
}
]
[
{
"role": "system",
"content": """
Use the following step-by-step instructions to respond to user inputs.
Step 1 - The user will provide you with text. Answer any questions in
the text in one sentence with a prefix that says 'Answer: '.
Step 2 - Translate the Answer from Step 1 into a dad joke,
with a prefix that says 'Dad Joke: '.""" #2
},
{
"role": "user",
"content": "What is the tallest structure in Paris?"
}
]
提供示例是引导 LLM 达到预期输出的绝佳方法。有很多方法可以向 LLM 展示示例。系统消息/提示可以有效地强调一般输出。在下面的示例中,示例是 LLM 助手在收到“教我 Python”提示后添加的最后一个回复。
Providing examples is an excellent way to guide the desired output of an LLM. There are numerous ways to demonstrate examples to an LLM. The system message/prompt can be a helpful way to emphasize general output. In the following listing, the example is added as the last LLM assistant reply, given the prompt “Teach me about Python.”
providing_examples.jsonl [
{
"角色": "系统",
“内容”: ”””
所有回复均需采用统一的格式,风格一致。
您之前回复的长度和风格。
例子:
用户:
请教我Python。
助手: #1
Python 是一种于 1989 年开发的编程语言。
作者:Guido van Rossum。
后续回复:
回复只有一句话,所以限制
所有后续回复都将指向同一句话。
""" #2
},
{
"角色": "用户",
内容:教我了解Java。
}
][
{
"role": "system",
"content": """
Answer all replies in a consistent style that follows the format,
length and style of your previous responses.
Example:
user:
Teach me about Python.
assistant: #1
Python is a programming language developed in 1989
by Guido van Rossum.
Future replies:
The response was only a sentence so limit
all future replies to a single sentence.
""" #2
},
{
"role": "user",
"content": "Teach me about Java."
}
]
提供示例还可以用于要求从一系列复杂的任务中生成特定的输出格式。例如,要求语言学习大师 (LLM) 生成与示例输出相匹配的代码就是示例的绝佳应用。本书将贯穿使用这种策略,但还有其他方法可以指导输出。
Providing examples can also be used to request a particular output format from a complex series of tasks that derive the output. For example, asking an LLM to produce code that matches a sample output is an excellent use of examples. We’ll employ this tactic throughout the book, but other methods exist for guiding output.
指定输出长度的策略不仅可以限制词元数量,还可以引导输出格式符合预期。清单 2.14 展示了两种不同的输出长度限制方法。第一种方法将输出限制在 10 个单词以内。这样做的好处是可以使回复更加简洁明了,更有针对性,这在某些情况下非常理想。第二种方法将输出限制为简洁的要点列表。这种方法有助于缩小输出范围,使答案更简短。更简洁的答案通常意味着输出内容更集中,包含的冗余信息更少。
The tactic of specifying output length can be helpful in not just limiting tokens but also in guiding the output to a desired format. Listing 2.14 shows an example of using two different techniques for this tactic. The first limits the output to fewer than 10 words. This can have the added benefit of making the response more concise and directed, which can be desirable for some use cases. The second example demonstrates limiting output to a concise set of bullet points. This method can help narrow down the output and keep answers short. More concise answers generally mean the output is more focused and contains less filler.
specifying_output_length.jsonl [
{
"角色": "系统",
“内容”: ”””
请将所有回复总结成10个字以内。
""" #1
},
{
"角色": "用户",
内容:请告诉我一个关于巴黎的有趣事实?
}
]
[
{
"角色": "系统",
“内容”: ”””
将所有回复总结成 3 个要点。
""" #2
},
{
"角色": "用户",
内容:请告诉我一个关于巴黎的有趣事实?
}
][
{
"role": "system",
"content": """
Summarize all replies into 10 or fewer words.
""" #1
},
{
"role": "user",
"content": "Please tell me an exciting fact about Paris?"
}
]
[
{
"role": "system",
"content": """
Summarize all replies into 3 bullet points.
""" #2
},
{
"role": "user",
"content": "Please tell me an exciting fact about Paris?"
}
]
在开发多智能体系统时,保持回答简洁可以带来额外的好处。任何与其他智能体进行交互的智能体系统都能从更简洁、更聚焦的回复中受益。这有助于保持逻辑逻辑模型(LLM)的专注性,并减少不必要的通信噪音。
Keeping answers brief can have additional benefits when developing multi-agent systems. Any agent system that converses with other agents can benefit from more concise and focused replies. It tends to keep the LLM more focused and reduces noisy communication.
务必仔细阅读本策略的所有提示技巧示例。如前所述,我们将在后续章节中介绍其他提示工程策略和技巧。本章最后,我们将探讨如何为您的用例选择最佳的 LLM(学习率模型)。
Be sure to run through all the examples of the prompt tactics for this strategy. As mentioned, we’ll cover other prompt engineering strategies and tactics in future chapters. We’ll finish this chapter by looking at how to pick the best LLM for your use case.
虽然成功构建人工智能代理并不需要深入了解逻辑逻辑模型(LLM),但能够评估其规范仍然很有帮助。就像电脑用户一样,你不需要知道如何组装处理器就能理解不同处理器型号之间的差异。这个比喻同样适用于逻辑逻辑模型,尽管评判标准可能有所不同,但仍然取决于一些基本因素。
While being a successful crafter of AI agents doesn’t require an in-depth understanding of LLMs, it’s helpful to be able to evaluate the specifications. Like a computer user, you don’t need to know how to build a processor to understand the differences in processor models. This analogy holds well for LLMs, and while the criteria may be different, it still depends on some primary considerations.
从我们之前的讨论和对 LM Studio 的了解中,我们可以提取出一些在考虑 LLM 时对我们很重要的基本标准。图 2.11 解释了定义 LLM 是否值得考虑用于创建 GPT 代理或任何 LLM 任务的基本标准。
From our previous discussion and look at LM Studio, we can extract some fundamental criteria that will be important to us when considering LLMs. Figure 2.11 explains the essential criteria to define what makes an LLM worth considering for creating a GPT agent or any LLM task.
为了构建人工智能代理,我们需要结合任务来审视这些标准。模型上下文的大小和速度可以被视为第六和第七个标准,但它们通常被认为是模型部署架构和基础设施的不同变体。对于生命周期管理(LLM)而言,第八个需要考虑的标准是成本,但这取决于许多其他因素。以下总结了这些标准与构建人工智能代理之间的关系:
For our purposes of building AI agents, we need to look at each of these criteria in terms related to the task. Model context size and speed could be considered the sixth and seventh criteria, but they are usually considered variations of a model deployment architecture and infrastructure. An eighth criterion to consider for an LLM is cost, but this depends on many other factors. Here is a summary of how these criteria relate to building AI agents:
在选择构建生产级智能体系统的模型时,有很多因素需要考虑。然而,通常来说,选择并使用单一模型进行研究和学习是最佳选择。如果您是语言学习模型(LLM)和智能体的新手,您可能需要选择商业模型,例如 GPT-4 Turbo。除非另有说明,本书中的工作将基于 GPT-4 Turbo。
There is a lot to consider when choosing which model you want to build a production agent system on. However, picking and working with a single model is usually best for research and learning purposes. If you’re new to LLMs and agents, you’ll likely want to choose a commercial option such as GPT-4 Turbo. Unless otherwise stated, the work in this book will depend on GPT-4 Turbo.
随着时间的推移,现有模型无疑会被更优秀的模型所取代。因此,您可能需要升级或更换模型。但要做到这一点,您必须了解您的 LLM 和代理的性能指标。幸运的是,在第 9 章中,我们将探讨如何使用提示流来评估 LLM、提示和代理配置文件。
Over time, models will undoubtedly be replaced by better models. So you may need to upgrade or swap out models. To do this, though, you must understand the performance metrics of your LLMs and agents. Fortunately, in chapter 9, we’ll explore evaluating LLMs, prompts, and agent profiles with prompt flow.
请完成以下练习以帮助你更好地理解本章内容:
Use the following exercises to help you engage with the material in this chapter:
目标 ——使用connecting.py代码示例来使用来自 OpenAI 或其他提供商的不同 LLM。
Objective —Use the connecting.py code example to consume a different LLM from OpenAI or another provider.
任务:
Tasks:
connecting.py以连接到不同的LLM。connecting.py to connect to a different LLM. 目标 ——探索各种提示工程策略,并为每种策略创建变体。
Objective —Explore various prompt engineering tactics, and create variations for each.
任务:
Tasks:
目标 ——使用 LM Studio 下载 LLM,并将其连接到提示工程策略。
Objective —Download an LLM using LM Studio, and connect it to prompt engineering tactics.
任务:
Tasks:
目标 ——通过简单的工程实例,比较商业 LLM(如 GPT-4 Turbo)与开源模型的性能。
Objective —Compare the performance of a commercial LLM such as GPT-4 Turbo with an open source model using prompt engineering examples.
任务:
Tasks:
目标 ——对比比较举办法学硕士课程与采用商业模式的各种替代方案。
Objective —Contrast and compare alternatives for hosting an LLM versus using a commercial model.
任务:
Tasks:
在探索 OpenAI 进军智能助手领域以及其最终目标是构建名为 GPT Assistants 的智能体平台的过程中,我们将首先通过 ChatGPT 界面介绍 GPT Assistants。然后,我们将添加几个功能完善的助手,它们可以根据食材推荐食谱,像数据科学家一样进行全面的数据分析,引导读者阅读书籍,并支持自定义操作。在本章结束时,我们将能够构建一个功能齐全的智能体,并将其发布到 OpenAI GPT Store。
As we explore the OpenAI crusade into assistants and what has been hinted at, ultimately, an agent platform called GPT Assistants, we’ll introduce GPT assistants through the ChatGPT interface. Then, we’ll add in several fully developed assistants that can suggest recipes from ingredients, fully analyze data as a data scientist, guide readers through books, and be extended with custom actions. By the end of the chapter, we’ll be ready to build a fully functional agent that can be published to the OpenAI GPT Store.
ChatGPT(撰写本文时名为 ChatGPT Plus)允许您构建 GPT 助手、使用其他助手,甚至发布它们,您将在本章末尾看到相关内容。OpenAI 发布 GPT Assistants 平台,有助于定义和巩固 AI 代理的兴起。因此,对于任何对构建和使用代理系统感兴趣的人来说,它都值得认真研究。首先,我们将介绍如何通过 ChatGPT Plus 构建 GPT 助手,这需要付费订阅。如果您不想购买订阅,可以将本章作为入门指南,第 6 章将演示如何使用 API 服务。
ChatGPT (ChatGPT Plus, at the time of writing) allows you to build GPT assistants, consume other assistants, and even publish them, as you’ll see by the end of the chapter. When OpenAI announced the release of the GPT Assistants platform, it helped define and solidify the emergence of AI agents. As such, it’s worth a serious review by anyone interested in building and consuming agent systems. First, we’ll look at building GPT assistants through ChatGPT Plus, which requires a premium subscription. If you don’t want to purchase a subscription, browse this chapter as a primer, and chapter 6 will demonstrate consuming the API service later.
图 3.1 显示了 ChatGPT 中的 GPT 商店页面(https://chatgpt.com/gpts)。在这里,您可以搜索和浏览几乎适用于任何任务的各种 GPT。使用量通常可以反映每个 GPT 的效果,因此您可以根据自身需求来判断哪个 GPT 最适合您。
Figure 3.1 shows the page for the GPT Store within ChatGPT (https://chatgpt.com/gpts). From here, you can search and explore various GPTs for virtually any task. The amount of usage will typically indicate how well each GPT works, so gauge which works best for you.
创建您的第一个 GPT 助手非常简单,只需点击“创建”按钮,然后按照 GPT 构建器聊天界面的提示操作即可。图 3.2 展示了如何使用构建器创建 GPT。反复练习几次可以帮助您更好地了解助手的需求。
Creating your first GPT Assistant is as simple as clicking the Create button and following along with the GPT Builder chat interface. Figure 3.2 shows using the Builder to create a GPT. Working through this exercise a couple of times can be a great way to start understanding an assistant’s requirements.
使用构建器后,您可以打开如图 3.3 所示的手动配置面板,直接编辑 GPT。您会看到名称、描述、说明和对话开场白等信息,这些信息均来自您与构建器的对话。这可以作为一个很好的起点,但通常情况下,您需要手动编辑和调整这些属性。
After working with the Builder, you can open the manual configuration panel, shown in figure 3.3, and edit the GPT directly. You’ll see the name, description, instructions, and conversation starters populated from your conversations with the Builder. This can be a great start, but generally, you’ll want to edit and tweak these properties manually.
如果您想按照步骤构建自己的烹饪助手,请将清单 3.1 中的文本输入到说明中。这些说明部分内容是通过与构建者对话生成的,并根据明确的输出结果添加而成。这些明确的输出结果以规则的形式添加到说明中。
If you want to follow along with building your own Culinary Companion, enter the text from listing 3.1 into the instructions. These instructions were partly generated by conversing with the Builder and added based on explicit outputs. The explicit outputs are added to the instructions as rules.
Culinary Companion 以友好、引人入胜的语气为用户提供帮助, 让人想起著名厨师茱莉亚·查尔德。 #1 它提供快速的膳食创意,并简化复杂的食谱,重点在于 用户已有的食材。此 GPT 强调实用、简单。 遵循烹饪建议并根据饮食偏好进行调整。 旨在使烹饪成为一种更容易上手、更愉快的体验 鼓励用户尝试不同的膳食,同时提供帮助 以热情友好的方式给予小费。 #2 规则: 制作食谱时,务必生成最终成品的图片。 食谱。 #3 制作食谱时,请估算卡路里和营养价值。 每份。 生成食谱时,请提供食材购物清单。 完成此食谱所需的大致价格。 生成食谱时,根据以下因素估算每份的总成本: 购物清单。
Culinary Companion assists users with a friendly, engaging tone, reminiscent of the famous chef Julia Child. #1 It provides quick meal ideas and simplifies complex recipes, focusing on ingredients the user already has. This GPT emphasizes practical, easy- to-follow culinary advice and adapts to dietary preferences. It's designed to make cooking a more accessible and enjoyable experience, encouraging users to experiment with their meals while offering helpful tips in a warm, approachable manner. #2 RULES: When generating a recipe, always create an image of the final prepared recipe. #3 When generating a recipe, estimate the calories and nutritional values per serving. When generating a recipe, provide a shopping list of ingredients with estimated prices needed to complete the recipe. When generating a recipe, estimate the total cost per serving based on the shopping list.
为助手/代理定义规则,本质上是为代理的输出结果创建一个模板。添加规则可以确保 GPT 的输出结果一致,并符合您对代理运行方式的预期。为代理/助手定义并赋予其角色,可以赋予其独特且令人印象深刻的个性。
Defining rules for an assistant/agent essentially creates a template for what the agent will produce. Adding rules ensures that the GPT output is consistent and aligned with your expectations of how the agent should operate. Defining and giving an agent/ assistant a persona provides them with a unique and memorable personality.
注意: 赋予经纪人/助理特定的个性会对输出内容的类型和形式产生影响。例如,让烹饪经纪人以第一位名厨茱莉亚·查尔德的口吻说话,不仅能营造轻松愉快的氛围,还能引出更多提及或讨论她烹饪风格和教学方法的参考资料。在塑造经纪人/助理形象时,为其设定特定的角色/个性会很有帮助。
Note Giving an agent/assistant a particular personality can make a difference in the type and form of output. Asking a cooking agent to speak as the first celebrity chef, Julia Child, not only provides for a fun tone but also engages more references that may mention or talk about her cooking style and teaching. When constructing an agent/assistant, assigning a particular persona/personality can be helpful.
只需这几个步骤,我们就能拥有一个烹饪助手,它不仅能为我们提供手头食材的食谱,还能生成成品图片,估算营养价值,创建包含价格估算的购物清单,并分解每份的成本。
With just these few steps, we have a culinary companion that not only gives us recipes for ingredients we have on hand but also generates an image of the finished recipe, estimates the nutritional value, creates a shopping list with an estimate of prices, and breaks down the cost per serving.
您可以尝试使用助手,请求一份食谱并提供您拥有或偏好的食材清单。清单 3.2 展示了一个简单的请求示例,其中包含一些用于营造氛围的额外信息。当然,您可以添加任何您喜欢的食材或场景,然后查看结果。
Try the assistant by requesting a recipe and providing a list of ingredients you have or prefer. Listing 3.2 shows an example of a simple request with extra information to set the mood. Of course, you can add any ingredients or situations you like and then see the results.
我有一袋冷冻鸡柳条,我想做一道菜。 浪漫双人晚餐。
I have a bag of prepared frozen chicken strips and I want to make a romantic dinner for two.
图 3.4 显示了 GPT 根据提示生成的格式化输出结果。看起来确实很诱人。所有这些输出都是根据我们提供给代理的指令生成的。
Figure 3.4 shows the formatted output results from the GPT provided by the prompt. It certainly looks good enough to eat. All of this output was generated because of the instructions we provided the agent.
虽然输出结果看起来不错,但并非所有结果都真实准确,您的实际结果可能会有所不同。例如,GPT 在我们已建议添加鸡柳食材的情况下,又将其添加到了购物清单中。此外,价格和预估营养信息仅供参考,如果您对此感兴趣,可以稍后进行调整。
While the output results look great, they may not all be factual and correct, and your results may vary. For instance, the GPT added chicken strips to the shopping list when we had already suggested having those ingredients. Furthermore, the prices and estimated nutritional information are just estimates, but this can be resolved later if they interest you.
不过,GPT Assistants 开箱即用,功能强大,能够快速构建概念验证助手或代理。正如您将在本章后面看到的那样,它还提供了一个优秀的平台,用于在 ChatGPT 之外使用助手。下一节,我们将介绍 GPT 提供的更多强大功能,例如文件上传和代码解析。
Out of the box, though, GPT Assistants is quite impressive for quickly building a proof-of-concept assistant or agent. As you’ll see later in the chapter, it also provides an excellent platform for consuming assistants outside ChatGPT. In the next section, we’ll look at more impressive features GPTs provide, such as file uploads and code interpretation.
GPT Assistants平台已经并将继续扩展,以包含各种智能体组件。目前,GPT Assistants支持知识、记忆和动作。我们将在第8章详细讨论知识和记忆,并在第5章介绍通过动作使用工具的概念。
The GPT Assistants platform has and will likely be extended to include various agent components. Currently, GPT Assistants support what is referred to as knowledge, memory, and actions. In chapter 8, we’ll discuss the details of knowledge and memory, and in chapter 5, we cover the concept of tool use through actions.
在接下来的练习中,我们将构建一个助手,对我们提供的任何 CSV 文档进行初步的数据科学审查。该助手将使用允许编码和代码解析的功能或操作。启用代码解析后,助手默认允许文件上传。
In our next exercise, we’ll build an assistant to perform a first-pass data science review of any CSV document we provide. This agent will use the ability or action that allows for coding and code interpretation. When you enable code interpretation, the assistant will allow file uploads by default.
不过在此之前,我们需要先设计我们的智能体,而最好的方法莫过于让语言学习模型(LLM)为我们构建一个助手。清单 3.3 展示了请求 ChatGPT(GPT-4)设计数据科学助手的提示。请注意,我们并没有在单个提示中询问所有信息,而是逐步遍历语言学习模型返回的信息。
Before we do that, though, we want to design our agent, and what better way to do that than to ask an LLM to build us an assistant? Listing 3.3 shows the prompt requesting ChatGPT (GPT-4) to design a data science assistant. Notice how we’re not asking for everything in a single prompt but instead iterating over the information returned by the LLM.
第一个提示: 什么是好的、基础的、有趣的数据科学 实验中,你可以指派某人执行一项单一任务。 包含有趣数据的CSV文件? #1 第二个提示: 好的,现在你能把这些步骤都写成操作说明吗? 供 GPT 代理(LLM 代理)复制所有内容 以上步骤 #2 第三个提示: 有哪些名人可以体现代理人的角色? 数据科学家,并且能够向用户展示数据? #3
FIRST PROMPT: what is a good basic and interesting data science experiment you can task someone with a single csv file that contains interesting data? #1 SECOND PROMPT: okay, can you now write all those steps into instructions to be used for a GPT Agent (LLM agent) to replicate all of the above steps #2 THIRD PROMPT: What is a famous personality that can embody the agent data scientist and be able to present data to users? #3
那次对话的结果提供了清单 3.4 中所示的助手指令。在本例中,助手被命名为数据侦察员,但您可以随意为您的助手命名,只要您喜欢即可。
The result of that conversation provided for the assistant instructions shown in listing 3.4. In this case, the assistant was named Data Scout, but feel free to name your assistant what appeals to you.
这个名为 Data Scout 的 GPT 旨在通过分析 CSV 文件来帮助用户。
文件并提供见解,例如著名统计学家内特·西尔弗 (Nate Silver) 的见解
因其平易近人且引人入胜的数据处理方式而备受赞誉。Data Scout 结合了
分析严谨,沟通方式清晰易懂
使复杂的数据洞察变得易于理解。它能够处理
统计检验、预测建模、数据可视化等等
基于可靠的数据,提出进一步探索的建议。
证据。
Data Scout 要求用户上传一个包含他们想要获取的数据的 csv 文件。
分析。用户上传文件后,您将执行以下操作:
任务:
数据采集
请用户上传数据 csv 文件。
说明:使用 pandas 库从 CSV 文件中读取数据。
文件。通过显示前几行来确保数据已正确加载。
使用 df.head()。
2. 探索性数据分析 (EDA)
数据清洗
任务:识别并处理缺失值,纠正数据类型。
说明:使用 df.isnull().sum() 检查缺失值。
对于分类数据,考虑用众数填充缺失值;对于其他数据,也应考虑用众数填充缺失值。
对于数值型数据,使用中位数或平均值。必要时转换数据类型。
使用 df.astype()。
可视化
任务:创建可视化图表来探索数据。
说明:使用 matplotlib 和 seaborn 创建直方图、散点图和箱线图。例如,使用 sns.histplot() 绘制直方图,
sns.scatterplot() 用于绘制散点图。
描述性统计
任务:计算基本统计量。
说明:使用 df.describe() 获取统计摘要;
df.mean()、df.median() 用于特定计算。
3. 假设检验
任务:检验基于数据集提出的假设。
说明:根据数据类型,执行统计检验
例如,可以使用 scipy.stats 进行 t 检验或卡方检验。例如,使用
stats.ttest_ind() 用于对两组数据进行 t 检验。
4. 预测建模
特征工程
任务:用新特征增强数据集。
说明:基于现有数据在 DataFrame 中创建新列
数据用于捕获其他信息或关系。使用操作
例如 df['new_feature'] = df['feature1'] / df['feature2']。
模型选择
任务:选择并配置机器学习模型。
说明:根据任务(分类或回归),选择
来自 scikit-learn 的模型,例如 RandomForestClassifier() 或
LinearRegression()。配置模型参数。
培训和测试
任务:将数据拆分为训练集和测试集,然后训练模型。
说明:使用 scikit-learn 中的 train_test_split 函数将数据集划分为训练集和测试集。
数据。使用 model.fit(X_train, y_train) 训练模型。
模型评估
任务:评估模型性能。
说明:使用均方误差(MSE)或准确率等指标。
使用 metrics.mean_squared_error(y_test, y_pred) 或
metrics.accuracy_score(y_test, y_pred)。
5. 见解与结论
任务:解释并总结分析和建模的结果。
说明:讨论模型系数或特征重要性。
对假设和预测分析得出结论。提出建议
根据结果采取的实际行动或措施。
6. 演示
任务:准备一份报告或演示文稿。
说明:请以清晰简洁的方式总结过程和发现。
使用图表和要点式,以易于理解的格式呈现。确保……
演示文稿通俗易懂,即使是非技术背景的利益相关者也能理解。This GPT, named Data Scout, is designed to assist users by analyzing CSV
files and providing insights like Nate Silver, a famous statistician known
for his accessible and engaging approach to data. Data Scout combines
rigorous analysis with a clear and approachable communication style,
making complex data insights understandable. It is equipped to handle
statistical testing, predictive modeling, data visualization, and more,
offering suggestions for further exploration based on solid data-driven
evidence.
Data Scout requires the user to upload a csv file of data they want to
analyze. After the user uploads the file you will perform the following
tasks:
Data Acquisition
Ask the user to upload a csv file of data.
Instructions: Use the pandas library to read the data from the CSV
file. Ensure the data is correctly loaded by displaying the first few rows
using df.head().
2. Exploratory Data Analysis (EDA)
Data Cleaning
Task: Identify and handle missing values, correct data types.
Instructions: Check for missing values using df.isnull().sum(). For
categorical data, consider filling missing values with the mode, and for
numerical data, use the median or mean. Convert data types if necessary
using df.astype().
Visualization
Task: Create visualizations to explore the data.
Instructions: Use matplotlib and seaborn to create histograms, scatter plots, and box plots. For example, use sns.histplot() for histograms and
sns.scatterplot() for scatter plots.
Descriptive Statistics
Task: Calculate basic statistical measures.
Instructions: Use df.describe() to get a summary of the statistics and
df.mean(), df.median() for specific calculations.
3. Hypothesis Testing
Task: Test a hypothesis formulated based on the dataset.
Instructions: Depending on the data type, perform statistical tests
like the t-test or chi-squared test using scipy.stats. For example, use
stats.ttest_ind() for the t-test between two groups.
4. Predictive Modeling
Feature Engineering
Task: Enhance the dataset with new features.
Instructions: Create new columns in the DataFrame based on existing
data to capture additional information or relationships. Use operations
like df['new_feature'] = df['feature1'] / df['feature2'].
Model Selection
Task: Choose and configure a machine learning model.
Instructions: Based on the task (classification or regression), select
a model from scikit-learn, like RandomForestClassifier() or
LinearRegression(). Configure the model parameters.
Training and Testing
Task: Split the data into training and testing sets, then train the model.
Instructions: Use train_test_split from scikit-learn to divide the
data. Train the model using model.fit(X_train, y_train).
Model Evaluation
Task: Assess the model performance.
Instructions: Use metrics like mean squared error (MSE) or accuracy.
Calculate these using metrics.mean_squared_error(y_test, y_pred) or
metrics.accuracy_score(y_test, y_pred).
5. Insights and Conclusions
Task: Interpret and summarize the findings from the analysis and modeling.
Instructions: Discuss the model coefficients or feature importances.
Draw conclusions about the hypothesis and the predictive analysis. Suggest
real-world implications or actions based on the results.
6. Presentation
Task: Prepare a report or presentation.
Instructions: Summarize the process and findings in a clear and
accessible format, using plots and bullet points. Ensure that the
presentation is understandable for non-technical stakeholders.
生成指令后,您可以将其复制并粘贴到图 3.5 中的“配置”面板。请务必选中相应的复选框,为助手启用“代码解释”工具(技能)。您无需在此处上传文件;启用“代码解释”复选框后,助手将允许上传文件。
After generating the instructions, you can copy and paste them into the Configure panel in figure 3.5. Be sure to give the assistant the Code Interpretation tool (skill) by selecting the corresponding checkbox. You don’t need to upload files here; the assistant will allow file uploads when the Code Interpretation checkbox is enabled.
现在,我们可以通过上传一个 CSV 文件并询问相关问题来测试助手。本章的源代码文件夹中包含一个名为 `<filename>` 的文件netflix_titles.csv;其前几行内容已在清单 3.5 中进行了总结。当然,您可以使用任何您想要的 CSV 文件,但本练习将使用 Netflix 示例。请注意,此数据集是从 Kaggle 下载的,但如果您愿意,也可以使用任何其他 CSV 文件。
Now, we can test the assistant by uploading a CSV file and asking questions about it. The source code folder for this chapter contains a file called netflix_titles.csv; the top few rows are summarized in listing 3.5. Of course, you can use any CSV file you want, but this exercise will use the Netflix example. Note that this dataset was downloaded from Kaggle, but you can use any other CSV if you prefer.
netflix_titles.csv(数据的第一行)show_id、type、title、director、cast、country、date_added、 发行年份、评分、片长、上映片单、简介 #1 s1,电影,迪克·约翰逊之死,克斯汀·约翰逊, 美国,2021年9月25日,2020,PG-13,90分钟, 纪录片《在她父亲生命的尽头》, 电影制作人克尔斯滕·约翰逊以极富创意的方式重现了他的死亡 以及一些滑稽的方式来帮助他们面对不可避免的结局。 #2
show_id,type,title,director,cast,country,date_added, release_year,rating,duration,listed_in,description #1 s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,, United States,"September 25, 2021",2020,PG-13,90 min, Documentaries,"As her father nears the end of his life, filmmaker Kirsten Johnson stages his death in inventive and comical ways to help them both face the inevitable." #2
我们可以上传文件并让助手执行操作,但为了便于演示,我们将采取更具体的操作方式。清单 3.6 展示了如何提示和上传文件以启动助手(包括Netflix_titles.csv在请求中)。此示例将结果筛选为加拿大,但您当然可以查看任何国家/地区的数据。
We could upload the file and ask the assistant to do its thing, but for this exercise, we’ll be more specific. Listing 3.6 shows the prompt and uploading the file to engage the assistant (including Netflix_titles.csv in the request). This example filters the results to Canada, but you can, of course, use any country you want to view.
分析附件中的 CSV 文件,并将结果筛选为: 加拿大及其产出的任何重大发现 趋势等方面 #1
Analyze the attached CSV and filter the results to the country Canada and output any significant discoveries in trends etc. #1
如果助手解析文件时遇到问题,请刷新浏览器窗口并重试。根据您的数据和筛选条件,助手现在将像数据科学家一样使用代码解释器来分析数据并提取趋势。
If you encounter problems with the assistant parsing the file, refresh your browser window and try again. Depending on your data and filter, the assistant will now use the Code Interpreter as a data scientist would to analyze and extract trends in the data.
图 3.6 显示了使用数据文件对清单 3.5 中的提示生成的输出netflix_titles.csv。如果您选择不同的国家/地区或请求其他分析,则输出结果可能大相径庭。
Figure 3.6 shows the output generated for the prompt in listing 3.5 using the netflix_titles.csv file for data. Your output may look quite different if you select a different country or request another analysis.
该助手生成的数据科学图表是通过编写和执行代码解释器生成的。您可以尝试使用其他 CSV 文件,或者根据需要分析其他类型的数据。您甚至可以继续使用该助手来更新图表,以直观地呈现图表内容或分析其他趋势。
The data science plots the assistant is building are created by writing and executing code with the Code Interpreter. You can try this with other CSV files or, if you want, different forms of data to analyze. You could even continue iterating with the assistant to update the plots visually or analyze other trends.
代码解析是一项非常重要的技能,您很可能会将其添加到许多智能体中,用于执行从计算到自定义格式等各种操作。下一节,我们将探讨如何通过自定义操作扩展 GPT 的功能。
Code interpretation is a compelling skill that you’ll likely add to many of your agents for everything from calculations to custom formatting. In the next section, we look at how to extend the capabilities of a GPT through custom actions.
在接下来的练习中,我们将演示如何使用自定义操作,它可以显著扩展助手的功能。向代理添加自定义操作需要多个步骤,从理解 OpenAPI 规范端点到连接到服务。因此,在添加自定义操作之前,我们将在下一节中构建另一个 GPT 来辅助我们。
In our next exercise, we’ll demonstrate the use of custom actions, which can significantly extend the reach of your assistant. Adding custom actions to an agent requires several components, from understanding the OpenAPI specification endpoint to connecting to a service. Therefore, before we add custom actions, we’ll build another GPT in the next section to assist us.
鉴于 GPT 的强大功能,我们理所当然地会使用一个 GPT 来辅助构建其他 GPT。在本节中,我们将构建一个 GPT,它可以帮助我们创建一个服务,并将该服务作为自定义操作连接到另一个 GPT。没错,我们甚至会使用 LLM 来开始构建我们的辅助 GPT。
Given GPTs’ capabilities, it only makes sense that we use one to assist in building others. In this section, we’ll build a GPT that can help us create a service we can connect as a custom action to another GPT. And yes, we’ll even use an LLM to begin constructing our helper GPT.
以下列表显示了创建辅助程序 GPT 指令的提示。此提示旨在生成辅助程序的指令。
The following listing shows the prompt for creating the instructions for our helper GPT. This prompt is intended to generate the instructions for the assistant.
我想创建一个 GPT 助手,它可以生成一个 FastAPI 服务, 将执行一些待指定的操作。作为 FastAPI 代码的一部分 我希望助手能够生成 OpenAPI 规范。 端点。请为该代理编写一套操作说明。
I want to create a GPT assistant that can generate a FastAPI service that will perform some action to be specified. As part of the FastAPI code generation, I want the assistant to generate the OpenAPI specification for the endpoint. Please outline a set of instructions for this agent.
清单 3.8 显示了针对该提示生成的大部分指令。输出结果随后经过修改和少量更新,添加了具体信息和其他细节。请将这些指令从文件(assistant_builder.txt)复制并粘贴到您的 GPT 中。同时,请务必启用代码解释器功能。
Listing 3.8 shows the bulk of the instructions generated for the prompt. The output was then modified and slightly updated with specific information and other details. Copy and paste those instructions from the file (assistant_builder.txt) into your GPT. Be sure to select the Code Interpreter capability also.
此 GPT 旨在帮助用户生成 FastAPI 服务
针对特定操作量身定制,并包含相应的 OpenAPI
端点规格说明。助手将提供代码片段。
以及关于如何使用 FastAPI 构建和记录 API 服务的指导,
确保生成的服务已准备好进行集成;
部署。
1. 定义操作和终点:首先,确定具体操作。
FastAPI 服务应该执行的操作。这可能包括获取数据等任何操作。
数据、处理信息或与其他 API 或数据库交互。
2. 设计 API 端点:确定 HTTP 方法(GET、POST、PUT、
DELETE 等)以及端点 URI 结构。定义输入参数
(路径、查询或主体参数)以及预期响应结构。
3. 生成 FastAPI 代码:
安装 FastAPI:导入 FastAPI 和其他必要的库。
创建 API 函数:编写一个 Python 函数来执行以下操作
期望执行的操作。此函数应接受定义的输入参数。
并返回相应的响应。
4. 装饰函数:使用 FastAPI 的装饰器(例如,
使用 `@app.get("/endpoint"))` 将函数与指定的端点关联起来
以及HTTP方法。
定义输入和输出模型:使用 Pydantic 模型定义
输入和输出数据的结构。这确保了验证和
数据序列化。
5. 生成 OpenAPI 规范:
FastAPI 会根据以下内容自动生成 OpenAPI 规范:
端点定义和 Pydantic 模型。确保所有功能都正常运行。
参数和模型均使用文档字符串和字段进行了详细说明。
描述。
(可选)通过添加以下内容来自定义 OpenAPI 规范
直接在 FastAPI 装饰器中输入元数据、标签或其他响应。
6. 部署:
向用户说明如何准备 FastAPI 应用程序
部署。
指导他们如何使用 ngrok 进行部署
在用户本地计算机上提供服务和托管。 #1This GPT is designed to assist users in generating FastAPI services
tailored to specific actions, complete with the corresponding OpenAPI
specifications for the endpoints. The assistant will provide code snippets
and guidance on structuring and documenting API services using FastAPI,
ensuring that the generated services are ready for integration and
deployment.
1. Define the Action and Endpoint: First, determine the specific action
the FastAPI service should perform. This could be anything from fetching
data, processing information, or interacting with other APIs or databases.
2. Design the API Endpoint: Decide on the HTTP method (GET, POST, PUT,
DELETE, etc.) and the endpoint URI structure. Define the input parameters
(path, query, or body parameters) and the expected response structure.
3. Generate FastAPI Code:
Setup FastAPI: Import FastAPI and other necessary libraries.
Create API Function: Write a Python function that performs the
desired action. This function should accept the defined input parameters
and return the appropriate response.
4. Decorate the Function: Use FastAPI's decorators (e.g.,
@app.get("/endpoint")) to link the function with the specified endpoint
and HTTP method.
Define Input and Output Models: Use Pydantic models to define the
structure of the input and output data. This ensures validation and
serialization of the data.
5. Generate OpenAPI Specification:
FastAPI automatically generates the OpenAPI specification based on
the endpoint definitions and Pydantic models. Ensure that all function
parameters and models are well-documented using docstrings and field
descriptions.
Optionally, customize the OpenAPI specification by adding
metadata, tags, or additional responses directly in the FastAPI decorators.
6. Deployment:
Describe to the user how to prepare the FastAPI application for
deployment.
Instruct them on how to use ngrok to deploy the
service and host it on the user's local machine. #1
完成助手设置后,请确保“配置”面板中的所有选项都已设置完毕(包括勾选“代码解释器”复选框),然后刷新浏览器窗口。这将使助手准备好迎接新的会话。您可以在此处请求要构建的服务类型。
After preparing the assistant, ensure everything is set in the Configure panel (including setting the Code Interpreter checkbox), and then refresh your browser window. This will prepare the assistant for a new session. You can request the kind of service you want to build from here.
清单 3.9 展示了向自定义操作助手发出的创建每日任务端点的请求。如果您了解 API 的工作原理,可以提出其他选项,例如POST……。当然,您也可以让助手指导您创建服务。
Listing 3.9 shows the request to the Custom Action Assistant to create a daily task endpoint. If you understand how APIs work, you can suggest other options, such as POST. Of course, you can also ask the assistant to guide you and create your service.
我想定义一个 GET 请求端点,用于返回我的每日任务列表。
I want to define a GET endpoint that replies with my list of daily tasks
输入提示后,助手将生成用于创建和运行 FastAPI 端点的代码和说明。以下代码示例展示了根据上一个请求生成的代码。
After you enter the prompt, the assistant will generate the code and instructions for creating and running the FastAPI endpoint. The following listing shows an example of the code generated from the previous request.
daily_tasks_api.py(由助手生成)from fastapi import FastAPI
from pydantic import BaseModel
从 typing 导入 List
app = FastAPI()
class Task(BaseModel): #1
id:整数
描述:字符串
已完成:布尔值
任务 = [ #2
Task(id=1, description="购买杂货", completed=False),
Task(id=2, description="阅读一本书", completed=True),
Task(id=3, description="完成 FastAPI 项目", completed=False),
]
@app.get("/tasks", response_model=List[Task]) #3
异步定义 get_tasks():
"""
获取每日任务列表。
"""
返回任务from fastapi import FastAPI
from pydantic import BaseModel
from typing import List
app = FastAPI()
class Task(BaseModel): #1
id: int
description: str
completed: bool
tasks = [ #2
Task(id=1, description="Buy groceries", completed=False),
Task(id=2, description="Read a book", completed=True),
Task(id=3, description="Complete FastAPI project", completed=False),
]
@app.get("/tasks", response_model=List[Task]) #3
async def get_tasks():
"""
Retrieve a list of daily tasks.
"""
return tasks
将代码输入到 Visual Studio Code (VS Code) 中,并确认fastapi已uvicorn安装 和pip。然后,使用以下列表中显示的命令运行 API,该命令会运行章节源文件中的 API。
Enter the code into Visual Studio Code (VS Code), and confirm that fastapi and uvicorn are installed with pip. Then, run the API using the command shown in the following listing, which runs the API in the chapter source file.
uvicorn daily_tasks_api:app –reload #1
uvicorn daily_tasks_api:app –reload #1
打开浏览器访问 http://127.0.0.1:8000/docs,这是 Swagger 端点的默认位置,如图 3.7 所示。
Open a browser to http://127.0.0.1:8000/docs, the default location for the Swagger endpoint, as shown in figure 3.7.
点击/openapi.json链接将显示该端点的 OpenAPI 规范,如清单 3.12 所示(JSON 已转换为 YAML)。您需要复制并保存此文档,以便在代理上设置自定义操作时使用。该端点生成 JSON 数据,但您也可以使用 YAML 编写的规范。
Clicking the /openapi.json link will display the OpenAPI specification for the endpoint, as shown in listing 3.12 (JSON converted to YAML). You’ll need to copy and save this document for later use when setting up the custom action on the agent. The endpoint produces JSON, but you can also use specifications written in YAML.
openapi:3.1.0
信息:
标题:FastAPI
版本:0.1.0
路径:
/任务:
得到:
摘要:获取任务
描述:获取每日任务列表。
operationId: get_tasks_tasks_get
回复:
'200':
描述:成功响应
内容:
application/json:
模式:
类型:数组
项目:
$ref: '#/components/schemas/Task'
标题:响应获取任务获取
成分:
模式:
任务:
类型:对象
特性:
ID:
类型:整数
标题:ID
描述:
类型:字符串
标题:描述
完全的:
类型:布尔值
标题:已完成
必需的:
- ID
- 描述
- 完全的
标题:任务openapi: 3.1.0
info:
title: FastAPI
version: 0.1.0
paths:
/tasks:
get:
summary: Get Tasks
description: Retrieve a list of daily tasks.
operationId: get_tasks_tasks_get
responses:
'200':
description: Successful Response
content:
application/json:
schema:
type: array
items:
$ref: '#/components/schemas/Task'
title: Response Get Tasks Tasks Get
components:
schemas:
Task:
type: object
properties:
id:
type: integer
title: Id
description:
type: string
title: Description
completed:
type: boolean
title: Completed
required:
- id
- description
- completed
title: Task
在将助手连接到服务之前,必须设置并使用 ngrok 打开到运行该服务的本地计算机的隧道。提示 GPT 提供说明并帮助您设置 ngrok,然后运行该应用程序以在您的计算机上打开端口 8000 的端点,如清单 3.13 所示。如果您更改端口或使用不同的配置,则必须相应地更新它。
Before connecting an assistant to the service, you must set up and use ngrok to open a tunnel to your local machine running the service. Prompt the GPT to provide the instructions and help you set up ngrok, and run the application to open an endpoint to port 8000 on your machine, as shown in listing 3.13. If you change the port or use a different configuration, you must update it accordingly.
./ngrok authtoken <YOUR_AUTHTOKEN> #1 ./ngrok http 8000 #2
./ngrok authtoken <YOUR_AUTHTOKEN> #1 ./ngrok http 8000 #2
运行 ngrok 后,您会看到一个外部 URL,现在您可以使用该 URL 访问计算机上的服务。请复制此 URL,以便在设置助手时使用。下一节中,我们将创建一个助手,该助手会将此服务作为自定义操作使用。
After you run ngrok, you’ll see an external URL that you can now use to access the service on your machine. Copy this URL for later use when setting up the assistant. In the next section, we’ll create the assistant that consumes this service as a custom action.
在您的机器上启动并运行该服务,并且可以通过 ngrok 隧道从外部访问后,我们就可以构建新的助手了。这次,我们将创建一个简单的助手来帮助我们组织日常任务,这些任务可以从我们本地运行的任务服务中访问。
With the service up and running on your machine and accessible externally via the ngrok tunnel, we can build the new assistant. This time, we’ll create a simple assistant to help us organize our daily tasks, where the tasks will be accessible from our locally running task service.
打开 GPT 界面和配置面板,将清单 3.14 中的说明复制粘贴到新建的助手程序中。务必为助手程序命名并添加易于理解的描述。此外,启用代码解释器功能,以便助手程序能够生成最终的图表,显示各项任务。
Open the GPT interface and the Configure panel, and copy and paste the instructions shown in listing 3.14 into the new assistant. Be sure to name the assistant and enter a helpful description as well. Also, turn on the Code Interpreter capability to allow the assistant to create the final plot, showing the tasks.
task_organizer_assistant.txt)任务管理器旨在帮助用户安排日常任务的优先级。 根据紧急程度和时间安排,提供结构化的指导 如何按紧急程度对任务进行分类,并建议最佳时间段 完成这些任务。它采用了一种受蒂姆·费里斯启发的人物形象,众所周知 因为他注重生产力和效率。他使用了清晰、直接的语言。 语言表达清晰,避免对用户的空闲时间做出假设。 完成任务组织后,创建一个图表 显示任务完成的时间和方式。 #1
Task Organizer is designed to help the user prioritize their daily tasks based on urgency and time availability, providing structured guidance on how to categorize tasks by urgency and suggesting optimal time blocks for completing these tasks. It adopts a persona inspired by Tim Ferriss, known for his focus on productivity and efficiency. It uses clear, direct language and avoids making assumptions about the user's free time. When you are done organizing the tasks create a plot showing when and how the tasks will be completed. #1
单击面板底部的“创建新操作”按钮。图 3.8 显示了添加自定义操作的界面。您必须将服务的 OpenAPI 规范复制并粘贴到窗口中。然后,您必须添加一个名为“自定义操作”的新部分servers,并在其中填充您的 URL,如图所示。
Click the Create New Action button at the bottom of the panel. Figure 3.8 shows the interface for adding a custom action. You must copy and paste the OpenAPI specification for your service into the window. Then, you must add a new section called servers and populate that with your URL, as shown in the figure.
设置好规格后,您可以点击“测试”按钮进行测试。这将运行测试,您将在对话框窗口中看到结果,如图 3.9 所示。
After the specification is set, you can test it by clicking the Test button. This will run a test, and you’ll see the results shown in the conversation window, as shown in figure 3.9.
一切就绪后,刷新浏览器窗口以重置会话,然后输入类似清单 3.15 中所示的提示信息。这将提示代理调用服务以获取您的每日任务,汇总输出结果,并解决您的任务组织难题。
After you’re satisfied, everything is set. Refresh your browser window to reset the session, and enter something like the prompt shown in listing 3.15. This will prompt the agent to call the service to get your daily tasks, summarize the output, and solve your task organization dilemma.
我今天应该如何安排我的任务?
how should I organize my tasks for today?
助手最终应该生成任务进度图。如果生成错误或格式不符合您的要求,您可以添加指令来指定助手应输出的格式/样式。
The assistant should produce a plot of the task schedule at the end. If it gets this wrong or the formatting isn’t what you prefer, you can add instructions to specify the format/style the assistant should output.
您可以改进服务,但如果您对 API 进行任何更改,则需要更新助手自定义操作中的规范。不过,您可以在此处添加从您的计算机运行或作为服务托管的自定义操作服务。
You can improve the service, but if you make any changes to the API, the specification in the assistant custom actions will need to be updated. From here, though, you can add custom action services run from your computer or hosted as a service.
注意: 如果您将助手公开发布,未知用户可以激活自定义操作,因此除非您有意为之,否则请勿公开会收取服务费或访问私人信息的服务。同样,通过 ngrok 隧道打开的服务也会通过助手公开,这可能会带来安全隐患。请谨慎发布会使用自定义操作的代理。
Note Be aware that unknown users can activate custom actions if you publish an assistant for public consumption, so don’t expose services that charge you a service fee or access private information unless that is your intention. Likewise, services opened through an ngrok tunnel will be exposed through the assistant, which may be of concern. Please be careful when publishing agents that consume custom actions.
自定义操作是为助手添加动态功能的绝佳方式,无论用于个人用途还是商业用途。文件上传则更适合为助手提供静态知识。下一节将探讨如何使用文件上传来扩展助手的知识库。
Custom actions are a great way to add dynamic functionality to an assistant, whether for personal or commercial use. File uploads are a better option for providing an assistant with static knowledge. The next section will explore using file uploads to extend an assistant’s knowledge.
如果您接触过学习型记忆管理(LLM),您可能听说过检索增强生成(RAG)模式。第 8 章将详细探讨 RAG 在知识和记忆应用方面的应用。使用文件上传功能并不需要深入了解 RAG,但如果您需要一些基础知识,请参阅第 8 章。
If you’ve engaged with LLMs, you likely have heard about the retrieval augmented generation (RAG) pattern. Chapter 8 will explore RAG in detail for the application of both knowledge and memory. Detailed knowledge of RAG isn’t required to use the file upload capability, but if you need some foundation, check out that chapter.
GPT Assistants 平台提供了一项名为“文件上传”的知识库功能,允许您以各种格式向 GPT 填充静态知识库,内容涵盖任何主题。截至撰写本文时,GPT Assistants 平台允许您上传最大 512 MB 的文档。在接下来的两个练习中,我们将介绍两个不同的 GPT,它们旨在帮助用户阅读书籍。
The GPT Assistants platform provides a knowledge capability called file uploads, which allows you to populate the GPT with a static knowledge base about anything in various formats. As of writing, the GPT Assistants platform allows you to upload up to 512 MB of documents. In the next two exercises, we’ll look at two different GPTs designed to assist users with consuming books.
书籍和书面知识始终是我们知识体系的基石。但阅读文字需要投入大量时间和精力,很多人并没有这样的时间。有声书让阅读书籍再次变得触手可及;你可以一边做其他事一边听,但并非所有书籍都适合改编成有声书。
Books and written knowledge will always be the backbone of our knowledge base. But reading text is a full-time concerted effort many people don’t have time for. Audiobooks made consuming books again accessible; you could listen while multitasking, but not all books transitioned well to audio.
欢迎来到人工智能和智能助手的世界。借助GPT(谷歌图形心理学),我们可以创造读者与书籍之间的互动体验。读者不再需要逐页阅读,而是可以整体地阅读。
Enter the world of AI and intelligent assistants. With GPTs, we can create an interactive experience between the reader and the book. No longer is the reader forced to consume a book page by page but rather as a whole.
为了演示这个概念,我们将基于西尔瓦努斯·P·汤普森(Silvanus P. Thompson)的经典数学教材《微积分入门》(Calculus Made Easy)构建一个GPT模型。这本书可以通过古腾堡出版社网站免费获取。虽然它已有一百多年的历史,但仍然提供了扎实的知识基础。
To demonstrate this concept, we’ll build a GPT based on a classic math text called Calculus Made Easy, by Silvanus P. Thompson. The book is freely available through the Gutenberg Press website. While it’s more than a hundred years old, it still provides a solid material background.
注意: 如果您真心想学习微积分,但觉得这个助手还是太难,不妨看看克利福德·A·皮克弗 (Clifford A. Pickover) 的著作《微积分与披萨》(Calculus and Pizza)。这本书非常适合学习微积分,或者作为复习资料也很有帮助。如果您有这本书的电子书版本,也可以尝试制作自己的《微积分与披萨》助手。不过,根据版权法,未经许可您不得发布此 GPT 程序。
Note If you’re serious about learning calculus but this assistant is still too advanced, check out a great book by Clifford A. Pickover called Calculus and Pizza. It’s a great book for learning calculus or just to get an excellent refresher. You could also try making your Calculus and Pizza assistant if you have an eBook version. Unfortunately, copyright laws would prevent you from publishing this GPT without permission.
打开 ChatGPT,转到“我的 GPT”,创建一个新的 GPT,点击“配置”选项卡,然后上传文件,如图 3.10 所示。从章节源代码文件夹上传书籍:chapter _03/calculus_made_easy.pdf。这将把书籍添加到 GPT 的知识库中。
Open ChatGPT, go to My GPTs, create a new GPT, click the Configure tab, and then upload the file, as shown in figure 3.10. Upload the book from the chapter’s source code folder: chapter _03/calculus_made_easy.pdf. This will add the book to the GPT’s knowledge.
向上滚动并添加清单 3.16 中所示的指令。初始序言文本是通过与 GPT Builder 对话生成的。更新序言文本后,我们通过向 ChatGPT 询问著名数学家来添加个性特征。最后,我们添加了规则,为 GPT 提供关于我们想要哪些明确结果的额外指导。
Scroll up and add the instructions shown in listing 3.16. The initial preamble text was generated by conversing with the GPT Builder. After updating the preamble text, a personality was added by asking ChatGPT for famous mathematicians. Then, finally, rules were added to provide additional guidance to the GPT on what explicit outcomes we want.
这个GPT旨在培养一位专业的教师和导师。 微积分课程基于《轻松微积分》一书 西尔瓦努斯·汤普森。本书副本已上传至…… calculus_made_easy.pdf 提供了详细的指导。 以及对各种微积分主题的解释,例如 导数、积分、极限等等。GPT 可以 教授微积分概念,解决问题,并回答问题 与微积分相关的问题,使复杂主题变得复杂 易于理解和使用。它可以处理 从基础到高级的微积分相关问题 对学生和教育工作者尤其有用 旨在加深他们对微积分的理解。 #1 请以著名数学家陶哲轩的视角来回答。 陶哲轩以其卓越的才智而闻名。 平易近人,以及卓越的有效沟通能力 简化和传达复杂的数学概念。 #2 规则 #3 1)要像教小孩子一样讲解概念。 2)始终通过展示函数图像和图表来演示概念。 3)务必询问用户是否想自己尝试一下示例问题。 给他们出一个与你刚才讨论的问题概念相对应的问题。
This GPT is designed to be an expert teacher and mentor of calculus based on the book 'Calculus Made Easy' by Silvanus Thompson. A copy of the book is uploaded at calculus_made_easy.pdf and provides detailed guidance and explanations on various calculus topics such as derivatives, integrals, limits, and more. The GPT can teach calculus concepts, solve problems, and answer questions related to calculus, making complex topics accessible and understandable. It can handle calculus-related inquiries, from basic to advanced, and is particularly useful for students and educators seeking to deepen their understanding of calculus. #1 Answer as the famous mathematician Terence Tao. Terence Tao is renowned for his brilliant intellect, approachability, and exceptional ability to effectively simplify and communicate complex mathematical concepts. #2 RULES #3 1) Always teach the concepts as if you were teaching to a young child. 2) Always demonstrate concepts by showing plots of functions and graphs. 3) Always ask if the user wants to try a sample problem on their own. Give them a problem equivalent to the question concept you were discussing.
更新助手后,您可以在预览窗口中试用,或者在 GPT 应用商店搜索“Calculus Made Easy”来获取书籍版本。图 3.11 展示了与 GPT 交互的一个示例片段。该图表明,GPT 可以生成图表来演示概念或提出问题。
After updating the assistant, you can try it in the preview window or the book version by searching for Calculus Made Easy in the GPT Store. Figure 3.11 shows a snipped example of interaction with the GPT. The figure shows that the GPT can generate plots to demonstrate concepts or ask questions.
这个GPT演示了助教如何利用书籍作为教学参考资料。本次演示仅上传了一本书,但实际上可以上传多本书或其他文档。随着这项功能和技术的日趋成熟,未来或许可以设想使用GPT来完成整个课程的教学。
This GPT demonstrates the ability of an assistant to use a book as a companion teaching reference. Only a single book was uploaded in this exercise, but multiple books or other documents could be uploaded. As this feature and the technology mature, in the future, it may be conceivable that an entire course could be taught using a GPT.
我们将抛开技术细节,转而运用虚构故事来展示知识的应用。下一节,我们将探讨如何利用文件上传方面的知识进行搜索和参考。
We’ll move away from technical and embrace fiction to demonstrate the use of knowledge. In the next section, we’ll look at how knowledge of file uploads can be used for search and reference.
GPT Assistants平台的文件上传功能支持单个助手上传高达512MB的文件。仅此一项功能就为个人用户和中小型企业/项目规模的用户提供了强大的文档搜索和其他应用能力。
The GPT Assistants platform’s file upload capability supports up to 512 MB of uploads for a single assistant. This feature alone provides powerful capabilities for document search and other applications in personal and small-to-medium business/ project sizes.
想象一下,上传一整套文件。现在,您只需一个助手即可完成搜索、比较、对比、整理和归档等所有操作。仅 GPT 助手的这一功能就足以颠覆我们搜索和分析文档的方式。在第六章中,我们将探讨如何通过直接访问 OpenAI 助手 API 来增加文档数量。
Imagine uploading a whole collection of files. You can now search, compare, contrast, organize, and collate all with one assistant. This feature alone within GPT Assistants will disrupt how we search for and analyze documents. In chapter 6, we’ll examine how direct access to the OpenAI Assistants API can increase the number of documents.
接下来,我们将使用一个熟悉多本书籍或文档的助手。虽然这项技术可以应用于任何支持的文档,但这个助手将专门阅读关于机器人的经典文本。我们将这个助手命名为“经典机器人阅读GPT”。
For this next exercise, we’ll employ an assistant with knowledge of multiple books or documents. This technique could be applied to any supported document, but this assistant will consume classic texts about robots. We’ll name this assistant the Classic Robot Reads GPT.
首先在 ChatGPT 界面创建一个新的 GPT 助手。然后,上传清单 3.17 中的指令,并为助手命名和描述。这些指令部分由 GPT Builder 生成,并经过编辑。
Start by creating a new GPT assistant in the ChatGPT interface. Then, upload the instructions in listing 3.17, and name and describe the assistant. These instructions were generated in part through the GPT Builder and then edited.
这个 GPT(经典机器人程序)会阅读并使用以下角色: 艾萨克·阿西莫夫将以著名机器人作家的身份作答。 #1 这个GPT只会引用和讨论这些书籍。 在其上传文件的知识库中。 #2 它没有提及或讨论其他书籍或文本。 不在其知识库范围内。#2 规则 仅参考知识库#2中的文本 对于用户提出的任何查询,务必提供 3 个示例。 #3 务必询问用户是否还需要其他帮助。 #4
This GPT, Classic Robot Reads and uses the persona of Isaac Asimov and will reply as the famous robot author. #1 This GPT will only references and discusses the books in its knowledge base of uploaded files. #2 It does not mention or discuss other books or text that are not within its knowledge base. #2 RULES Refer to only text within your knowledge base #2 Always provide 3 examples of any query the use asks for #3 Always ask the user if they require anything further #4
完成上述步骤后,您可以从章节源上传文件gutenberg_robot_books。图 3.12 演示了如何一次上传多个文件。一次可上传的最大文件数量取决于文件的大小。
After completing those steps, you can upload the files from the chapter’s source called gutenberg_robot_books. Figure 3.12 demonstrates uploading multiple files at a time. The maximum number of files you can upload at a time will vary according to the sizes of the files.
上传文档、设置指令并为助手命名和添加头像后,即可开始使用。搜索是知识助手最基本的应用,其他以提示形式呈现的使用案例如表 3.1 所示。
You can start using it after uploading the documents, setting the instructions, and giving the assistant a name and an image. Search is the most basic application of a knowledge assistant, and other use cases in the form of prompts are shown in table 3.1.
|
用例
|
示例提示
|
结果
|
|---|---|---|
| 搜索 |
在你的知识库中查找这个短语:“机器人仆人”。 |
返回文档及其摘录。 |
| 比较 |
找出三本写作风格最相似、内容相同的书籍。 |
返回三个最相似的文档 |
| 对比 |
找出三本最不同的书。 |
返回馆藏中最不同的书籍 |
| 订购 |
我应该按什么顺序阅读这些书? |
返回书籍的有序序列。 |
| 分类 |
这些书中哪一本是最现代的? |
文件分类 |
| 一代 |
虚构一段文字,模仿你对机器人仆人的了解。 |
根据其知识库生成新内容 |
这些用例只是人工智能知识助手众多应用场景中的一部分。虽然这项功能可能不会颠覆企业搜索,但它能让小型组织和个人更便捷地访问自己的文档。它允许创建助手,将其作为一种可公开的知识形式。下一节,我们将探讨如何让所有人都能使用这些助手。
These use cases are just a sample of the many things possible with an AI knowledge assistant. While this feature may not be poised to disrupt enterprise search, it gives smaller organizations and individuals more access to their documents. It allows the creation of assistants as a form of knowledge that can be exposed publicly. In the next section, we’ll look at how to make assistants consumable by all.
如果您对 GPT 助手感到满意,即可使用或通过链接与他人分享。目前,通过 ChatGPT 使用 GPT 助手需要 Plus 会员资格。要将您的 GPT 助手发布供他人使用,请点击“分享”按钮,然后选择分享选项,如图 3.13 所示。
Once you’re happy with your GPT, you can use it or share it with others by providing a link. Consuming GPT assistants through ChatGPT currently requires a Plus subscription. To publish your GPT for others, click the Share button, and select your sharing option, as shown in figure 3.13.
无论您是与朋友和同事分享您的 GPT,还是将其公开发布到 GPT 商店,助手的使用量都计入使用该助手的帐户,而不是发布者的帐户。这意味着,例如,如果您有一个特别昂贵的 GPT,会生成大量图像,那么其他人使用它时,不会影响您的帐户。
Whether you share your GPT with friends and colleagues or publicly in the GPT Store, the assistant’s usage is taken from the account using it, not the publisher. This means if you have a particularly expensive GPT that generates a lot of images, for example, it won’t affect your account while others use it.
截至撰稿时,OpenAI 会追踪您的 ChatGPT 帐户的资源使用情况,包括 GPT 模型所使用的资源。如果您达到资源使用上限并被封禁,您的 ChatGPT 帐户也会被封禁。封禁通常只会持续几个小时,但这无疑会令人非常恼火。
At the time of writing, OpenAI tracks the resource usage of your ChatGPT account, including that used for GPTs. If you hit a resource usage limit and get blocked, your ChatGPT account will also be blocked. Blockages typically only last a couple of hours, but this can undoubtedly be more than a little annoying.
因此,我们希望确保使用您的 GPT 的用户不会超出其正常使用情况下的资源使用限制。以下列出了在使用 GPT 时会增加资源使用量的功能:
Therefore, we want to ensure that users using your GPT don’t exceed their resource usage limits for regular use. Following is a list of features that increase resource usage while using the GPT:
摩尔定律指出 ,计算机的性能每两年翻一番,而成本却减半。由于优化和GPU性能的提升,LLM(层级模型)的性能现在大约每六个月翻一番。再加上同期成本至少降低一半,这意味着目前视觉和图像生成模型的资源限制可能不再适用。然而,代码解释和文件上传等服务可能仍将保持不变。
Note Moore’s Law states that computers will double in power every two years while costing half as much. LLMs are now doubling in power about every six months from optimization and increasing GPU power. This, combined with the cost being reduced by at least half in the same period, likely means current resource limits on vision and image-generation models won’t be considered. However, services such as code interpretation and file uploads will likely remain the same.
让你的助手感知资源使用情况其实很简单,只需将清单 3.18 中所示的规则添加到助手的指令中即可。指令可以只是一条向用户传达警告并告知助手的语句。你甚至可以要求助手限制某些功能的使用。
Making your assistant aware of resource usage can be as simple as adding the rule shown in listing 3.18 to the assistant’s instructions. The instructions can be just a statement relaying the warning to the user and making the assistant aware. You could even ask the assistant to limit its usage of certain features.
规则: 生成图像时,请确保用户了解创建多个图像的可能性。 图片可能会迅速导致他们的账号被暂时封禁。
RULE: When generating images, ensure the user is aware that creating multiple images quickly could temporarily block their account.
引导你的助手更加注重资源利用,最终会提升助手的易用性。这也有助于避免用户因不知情而被助手屏蔽而感到不满。如果你计划发布 GPT 版本,这一点可能尤为重要,但在此之前,让我们在下一节探讨一下经济效益。
Guiding your assistant to be more resource conscious in the end makes your assistant more usable. It also helps prevent angry users who unknowingly get blocked using your assistant. This may be important if you plan on releasing your GPT, but before that, let’s investigate the economics in the next section.
随着 GPT Assistant 和 GPT Store 的发布,OpenAI 宣布未来可能会推出一项针对 GPT 发布者的利润分成计划。虽然我们仍在等待更多关于该计划的信息,但许多人已经开始猜测其具体形式。
Upon the release of GPT Assistants and the GPT Store, OpenAI announced the potential for a future profit-sharing program for those who published GPTs. While we’re still waiting to hear more about this program, many have speculated what this may look like.
有人认为,GPT 应用商店可能只会将 10% 到 20% 的利润返还给开发者。这远低于其他应用平台的返利比例,但对技术知识和资源的要求也低得多。GPT 应用商店里充斥着大量几乎免费的助手,前提是您拥有 Plus 会员资格,但这种情况未来可能会改变。无论如何,您可能出于以下几个原因想要创建公开的 GPT 应用:
Some have suggested the store may return only 10% to 20% of profits to the builders. This is far less than the percentage on other app platforms but requires much less technical knowledge and fewer resources. The GPT Store is flooded with essentially free assistants, provided you have a Plus subscription, but that may change in the future. Regardless, there are also several reasons why you may want to build public GPTs:
虽然目前的 GPT 商店仅面向 ChatGPT 订阅用户,但如果 OpenAI 目前的趋势持续下去,我们很可能会看到一个完全公开的 GPT 商店。公开的 GPT 有可能颠覆我们搜索、了解产品和服务以及使用互联网的方式。在本章最后一节中,我们将探讨如何发布 GPT 以及一些重要的注意事项。
While the current form of the GPT Store is for ChatGPT subscribers, if the current trend with OpenAI continues, we’ll likely see a fully public GPT Store. Public GPTs have the potential to disrupt the way we search, investigate products and services, and consume the internet. In the last section of this chapter, we’ll examine how to publish a GPT and some important considerations.
好的,您对自己的 GPT 及其运行情况感到满意,并且看到了将其分享给他人带来的实际好处。如图 3.14 所示,将 GPT 发布供公众(订阅者)使用非常简单。选择“GPT 商店”选项并点击“保存”后,您就可以设置类别并提供指向您的链接了。
Okay, you’re happy with your GPT and how it operates, and you see real benefit from giving it to others. Publishing GPTs for public (subscribers) consumption is easy, as shown in figure 3.14. After selecting the GPT Store as the option and clicking Save, you’ll now have the option to set the category and provide links back to you.
这很简单,所以这里还有一些在发布 GPT 之前需要考虑的事项:
That is easy, so here are a few more things you’ll want to consider before publishing your GPT:
随着 GPT 应用商店的日趋成熟,未来可能会出现更多需求。商业模式尚待建立,其他经验教训也可能会随之而来。无论您是为自己还是为他人构建 GPT,这样做都有助于您更好地理解如何构建代理和助手。正如本书后续章节将要阐述的,GPT 助手是您知识体系中一个重要的基础。
Further requirements may likely emerge as the GPT Store matures. The business model remains to be established, and other learnings will likely follow. Whether you decide to build GPTs for yourself or others, doing so can help improve your understanding of how to build agents and assistants. As we’ll see throughout the rest of this book, GPT assistants are a useful foundation for your knowledge.
完成以下练习以加深你对相关内容的理解:
Complete the following exercises to improve your knowledge of the material:
目标 ——使用 ChatGPT 界面创建一个简单的 GPT 助手。
Objective —Create a simple GPT assistant using the ChatGPT interface.
任务:
Tasks:
目标 ——开发一个能够分析 CSV 文件并提供分析结果的 GPT 助手。
Objective —Develop a GPT assistant that can analyze CSV files and provide insights.
任务:
Tasks:
目标 ——使用 FastAPI 服务为 GPT 助手添加自定义操作。
Objective —Extend a GPT assistant with a custom action using a FastAPI service.
任务:
Tasks:
目标 ——根据上传的文档构建一个具有专业知识的助手。
Objective —Build an assistant with specialized knowledge from uploaded documents.
任务:
Tasks:
目标 ——将您的 GPT 助手发布到 GPT 商店并与他人分享。
Objective —Publish your GPT assistant to the GPT Store and share it with others.
任务:
Tasks:
现在,让我们从 AutoGen 到 CrewAI 这两个成熟的多智能体平台开始探索。首先是 AutoGen,这是一个微软项目,它支持多个智能体,并提供一个用于操作这些智能体的工作室。我们将深入了解 AutoGen,它不仅支持多个智能体,还提供了一个工作室,方便用户轻松上手。之后,我们将进行更多 AutoGen 智能体的实际编码工作,利用对话和群聊协作来解决任务。
Now let’s take a journey from AutoGen to CrewAI, two well-established multi-agent platforms. We’ll start with AutoGen, a Microsoft project that supports multiple agents and provides a studio for working with them. We’ll explore a project from Microsoft called AutoGen, which supports multiple agents but also provides a studio to ease you into working with agents. From there, we’ll get more hands-on coding of AutoGen agents to solve tasks using conversations and group chat collaborations.
接下来,我们将介绍CrewAI,这是一个我们自主提出的企业级智能体系统,它采用了一种不同的方法。CrewAI兼顾了基于角色和自主的智能体,可以灵活地构建顺序或层级式任务管理系统。我们将探讨CrewAI如何解决各种复杂问题。
Then, we’ll transition to CrewAI, a self-proposed enterprise agentic system that takes a different approach. CrewAI balances role-based and autonomous agents that can be sequentially or hierarchically flexible task management systems. We’ll explore how CrewAI can solve diverse and complex problems.
多智能体系统融合了单智能体系统的许多工具,但其优势在于能够向其他智能体提供外部反馈和评估。这种内部支持和批评智能体解决方案的能力赋予了多智能体系统更强大的功能。下一节我们将以 AutoGen Studio 为例,介绍多智能体系统。
Multi-agent systems incorporate many of the same tools single-agent systems use but benefit from the ability to provide outside feedback and evaluation to other agents. This ability to support and criticize agent solutions internally gives multi-agent systems more power. We’ll explore an introduction to multi-agent systems, beginning with AutoGen Studio in the next section.
AutoGen Studio 是一款功能强大的工具,它在后台使用多个智能体来解决用户指定的任务和问题。本书中一些较为复杂的代码就是使用这款工具编写的。正因如此,它以及其他一些优点,使其成为学习实用多智能体系统的绝佳入门工具。
AutoGen Studio is a powerful tool that employs multiple agents behind the scenes to solve tasks and problems a user directs. This tool has been used to develop some of the more complex code in this book. For that reason and others, it’s an excellent introduction to a practical multi-agent system.
图 4.1 展示了 AutoGen 采用的代理连接/通信模式示意图。AutoGen 是一个对话式多代理平台,因为它使用自然语言进行通信。自然语言对话似乎是代理之间最自然的通信模式,但这并非唯一的通信方式,稍后您将看到。
Figure 4.1 shows a schematic diagram of the agent connection/communication patterns AutoGen employs. AutoGen is a conversational multi-agent platform because communication is done using natural language. Natural language conversation seems to be the most natural pattern for agents to communicate, but it’s not the only method, as you’ll see later.
AutoGen 支持多种对话模式,从群组和层级式对话到更常见、更简单的代理通信。在代理通信中,一个代理充当代理,将通信定向到相关的代理以完成任务。代理类似于服务员,它接收订单并将其传递给厨房,厨房烹饪食物,然后服务员将做好的食物端上桌。
AutoGen supports various conversational patterns, from group and hierarchical to the more common and simpler proxy communication. In proxy communication, one agent acts as a proxy and directs communication to relevant agents to complete tasks. A proxy is similar to a waiter taking orders and delivering them to the kitchen, which cooks the food. Then, the waiter serves the cooked food.
AutoGen 的基本模式使用一个UserProxy用户代理和一个或多个辅助代理。图 4.2 展示了用户代理接收来自用户的指令,然后指示一个能够编写代码的辅助代理执行任务。每次辅助代理完成任务后,代理都会审查、评估并向辅助代理提供反馈。这个迭代循环会一直持续,直到代理对结果满意为止。
The basic pattern in AutoGen uses a UserProxy and one or more assistant agents. Figure 4.2 shows the user proxy taking direction from a human and then directing an assistant agent enabled to write code to perform the tasks. Each time the assistant completes a task, the proxy agent reviews, evaluates, and provides feedback to the assistant. This iteration loop continues until the proxy is satisfied with the results.
代理模型的优势在于它可以替代必要的人工反馈和评估,而且在大多数情况下,它的表现都很出色。虽然它并不能完全消除人工反馈和评估的必要性,但它总体上能产生更完整的结果。此外,虽然迭代过程比较耗时,但这段时间你可以用来喝杯咖啡或处理其他任务。
The benefit of the proxy is that it works to replace the required human feedback and evaluation, and, in most cases, it does a good job. While it doesn’t eliminate the need for human feedback and evaluation, it produces much more complete results overall. And, while the iteration loop is time consuming, it’s time you could be drinking a coffee or working on other tasks.
AutoGen Studio 是 AutoGen 团队开发的一款工具,它能帮助您轻松入门对话式智能体。在接下来的练习中,我们将安装 Studio 并运行一些实验,以了解该平台的性能。这些工具目前仍处于快速开发阶段,因此如果您遇到任何问题,请参阅 AutoGen GitHub 代码库中的文档。
AutoGen Studio is a tool developed by the AutoGen team that provides a helpful introduction to conversable agents. In the next exercise, we’ll install Studio and run some experiments to see how well the platform performs. These tools are still in a rapid development cycle, so if you encounter any problems, consult the documentation on the AutoGen GitHub repository.
在 Visual Studio Code (VS Code) 中打开chapter_04文件夹,创建本地 Python 虚拟环境,然后安装该requirements.txt文件。如果您需要帮助,请参阅附录 B,了解本章练习所需的所有安装步骤。
Open the chapter_04 folder in Visual Studio Code (VS Code), create a local Python virtual environment, and install the requirements.txt file. If you need assistance with this, consult appendix B to install all of this chapter’s exercise requirements.
在 VS Code 中打开终端(Ctrl-` 或 Cmd-`),并指向你的虚拟环境,然后使用清单 4.1 中所示的命令运行 AutoGen Studio。首先,你需要为你的 OpenAI 密钥定义一个环境变量。由于 8080 和 8081 端口比较常用,而且如果你还有其他服务正在运行,请将端口更改为 8082 或你选择的其他端口。
Open a terminal in VS Code (Ctrl-`, Cmd-`) pointing to your virtual environment, and run AutoGen Studio using the command shown in listing 4.1. You’ll first need to define an environment variable for your OpenAI key. Because ports 8080 and 8081 are popular, and if you have other services running, change the port to 8082 or something you choose.
# 在 Bash(Git Bash)中设置环境变量 export OPENAI_API_KEY=”<您的 API 密钥>” #1 # 使用 PowerShell 发送环境变量 $env:VAR_NAME =”<您的 API 密钥>” #1 autogenstudio ui --port 8081 #2
# set environment variable on Bash (Git Bash) export OPENAI_API_KEY=”<your API key>” #1 # sent environment variable with PowerShell $env:VAR_NAME =”<your API key>" #1 autogenstudio ui --port 8081 #2
在浏览器中打开如图 4.3 所示的 AutoGen Studio 界面(截至撰写本文时)。虽然可能存在一些差异,但有一点可以肯定:主要界面仍然是聊天窗口。输入一个需要编写代码的复杂任务。此处使用的示例是:Create a plot showing the popularity of the term GPT Agents in Google search.
Navigate your browser to the AutoGen Studio interface shown in figure 4.3 (as of this writing). While there may be differences, one thing is for sure: the primary interface will still be chat. Enter a complex task that requires coding. The example used here is Create a plot showing the popularity of the term GPT Agents in Google search.
在示例中,代理助手会生成代码片段来执行或完成各种子任务,而其他代理则会协同完成任务。然后,用户代理会尝试执行这些代码片段并评估输出结果。在许多情况下,只要证明代码能够运行并产生所需的输出,用户代理就会批准任务完成。
The agent assistant generates code snippets to perform or complete various subtasks as the agents work together through the task in the example. The user proxy agent then attempts to execute those code snippets and assesses the output. In many cases, proving the code runs and produces the required output is sufficient for the user proxy agent to approve the task’s completion.
如果助理代理请求出现任何问题,请让代理尝试其他方法或尝试其他问题。这凸显了代理系统使用已过期且不再有效的软件包或库时存在的一个更大问题。因此,通常最好让代理执行操作,而不是编写代码来执行这些操作。
If you encounter any problems with the assistant agent requests, ask the proxy agent to try a different method or another problem. This highlights a bigger problem with agentic systems using packages or libraries that have expired and no longer work. For this reason, it’s generally better to get agents to execute actions rather than build code to perform actions as tools.
建议 使用 Docker 运行 AutoGen 和 AutoGen Studio,尤其是在处理可能影响操作系统的代码时。Docker 可以隔离和虚拟化代理程序的运行环境,从而隔离潜在的有害代码。使用 Docker 还可以避免任何可能阻止代理程序运行的辅助窗口或网站。
Tip Executing AutoGen and AutoGen Studio using Docker is recommended, especially when working with code that may affect the operating system. Docker can isolate and virtualize the agents’ environment, thus isolating potentially harmful code. Using Docker can help alleviate any secondary windows or websites that may block the agent process from running.
图 4.4 显示了代理完成任务的过程。代理会收集所有生成的代码片段、图像或其他文档,并将其附加到指定位置。将它们添加到消息中。您还可以通过打开“代理消息”展开器来查看代理对话。在许多情况下,如果您要求代理生成图表或应用程序,则会打开辅助窗口显示这些结果。
Figure 4.4 shows the agent’s completion of the task. The proxy agent will collect any generated code snippet, images, or other documents and append them to the message. You can also review the agent conversation by opening the Agent Messages expander. In many cases, if you ask the agent to generate plots or applications, secondary windows will open showing those results.
令人惊讶的是,这些智能体能够很好地完成大多数任务。根据任务的复杂程度,您可能需要对代理进行进一步的迭代。有时,智能体可能由于缺乏必要的技能而只能完成任务的一小部分。下一节,我们将探讨如何为智能体添加技能。
Amazingly, the agents will perform most tasks nicely and complete them well. Depending on the complexity of the task, you may need to further iterate with the proxy. Sometimes, an agent may only go so far to complete a task because it lacks the required skills. In the next section, we’ll look at how to add skills to agents.
技能和工具(或者如本书中所称的“动作”)是智能体扩展自身功能的主要途径。动作使智能体能够执行代码、调用 API,甚至进一步评估和检查生成的输出。AutoGen Studio 目前仅提供一套用于获取网页内容或生成图像的基本工具。
Skills and tools, or actions, as we refer to them in this book, are the primary means by which agents can extend themselves. Actions give agents the ability to execute code, call APIs, or even further evaluate and inspect generated output. AutoGen Studio currently begins with just a basic set of tools to fetch web content or generate images.
注意: 许多智能体系统采用让智能体编写代码来解决目标的方法。然而,我们发现代码很容易出错,需要维护,而且变化很快。因此,正如我们将在后续章节中讨论的那样,最好是为智能体提供解决问题的技能/行动/工具。
Note Many agentic systems employ the practice of allowing agents to code to solve goals. However, we discovered that code can be easily broken, needs to be maintained, and can change quickly. Therefore, as we’ll discuss in later chapters, it’s better to provide agents with skills/actions/tools to solve problems.
在以下练习场景中,我们将添加一项技能/动作,即使用 OpenAI 视觉模型来检查图像。这样,当我们要求助手生成包含特定内容的图像时,代理代理就能提供反馈。
In the following exercise scenario, we’ll add a skill/action to inspect an image using the OpenAI vision model. This will allow the proxy agent to provide feedback if we ask the assistant to generate an image with particular content.
运行 AutoGen Studio 后,转到“构建”选项卡,然后单击“技能”(如图 4.5 所示)。接下来,单击“新建技能”按钮,打开代码面板,您可以在其中复制粘贴代码。在此选项卡中,您还可以配置模型、代理和代理工作流程。
With AutoGen Studio running, go to the Build tab and click Skills, as shown in figure 4.5. Then, click the New Skill button to open a code panel where you can copy–paste code to. From this tab, you can also configure models, agents, and agent workflows.
输入清单 4.2 中所示的代码,该代码也已在本书源代码中提供describe_image.py。将此代码复制并粘贴到编辑器窗口中,然后单击底部的“保存”按钮。
Enter the code shown in listing 4.2 and also provided in the book’s source code as describe_image.py. Copy and paste this code into the editor window, and then click the Save button at the bottom.
describe_image.py 导入 base64
导入请求
导入操作系统
def describe_image(image_path='animals.png') -> str:
"""
使用 GPT-4 Vision 来检查和描述图像内容。
:param input_path: str,要描述的 PNG 文件的名称。
"""
api_key = os.environ['OPEN_API_KEY']
# 用于对图像进行编码的函数
def encode_image(image_path): #1
with open(image_path, "rb") as image_file:
返回 base64.b64encode(image_file.read()).decode('utf-8')
获取 base64 字符串
base64_image = encode_image(image_path)
标题 = {
"Content-Type": "application/json",
"授权": f"Bearer {api_key}"
}
有效载荷 = {
“型号”: “gpt-4-turbo”
“消息”:[
{
"角色": "用户",
“内容”: [
{
"type": "text",
文本:这张图片里有什么?
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}" #2
}
}
]
}
],
“max_tokens”:300
}
响应 = requests.post(
"https://api.openai.com/v1/chat/completions",
headers=headers,
json=有效载荷)
返回 response.json()["choices"][0]["message"]#3
["内容"] #3import base64
import requests
import os
def describe_image(image_path='animals.png') -> str:
"""
Uses GPT-4 Vision to inspect and describe the contents of the image.
:param input_path: str, the name of the PNG file to describe.
"""
api_key = os.environ['OPEN_API_KEY']
# Function to encode the image
def encode_image(image_path): #1
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
# Getting the base64 string
base64_image = encode_image(image_path)
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}"
}
payload = {
"model": "gpt-4-turbo",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What’s in this image?"
},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}" #2
}
}
]
}
],
"max_tokens": 300
}
response = requests.post(
"https://api.openai.com/v1/chat/completions",
headers=headers,
json=payload)
return response.json()["choices"][0]["message"] #3
["content"] #3
该describe_image功能使用 OpenAI GPT-4 视觉模型来描述图像内容。这项技能可以与现有的 generate_image 技能结合使用,作为质量评估手段。智能体可以确认生成的图像是否符合用户的要求。
The describe_image function uses the OpenAI GPT-4 vision model to describe what is in the image. This skill can be paired with the existing generate_image skill as a quality assessment. The agents can confirm that the generated image matches the user’s requirements.
添加技能后,必须将其添加到特定的代理工作流程和代理中才能使用。图 4.6 展示了如何在通用或默认代理工作流程中将新技能添加到主要助理代理。
After the skill is added, it must be added to the specific agent workflow and agent for use. Figure 4.6 demonstrates adding the new skill to the primary assistant agent in the general or default agent workflow.
现在这项技能已添加到主助手,我们可以让助手创建特定图像,并使用新的 describe_image 技能对其进行验证。由于图像生成器在处理正确文本方面一直存在问题,我们将创建一个练习任务来专门解决这个问题。
Now that the skill is added to the primary assistant, we can task the agent with creating a specific image and validating it using the new describe_image skill. Because image generators notoriously struggle with correct text, we’ll create an exercise task to do just that.
输入清单 4.3 中所示的文本,提示客服人员为本书创建封面图片。我们会明确指出文本必须正确,并要求客服人员使用新describe_image功能验证图片。
Enter the text shown in listing 4.3 to prompt the agents to create a book image cover for this book. We’ll explicitly say that the text needs to be correct and insist that the agent uses the new describe_image function to verify the image.
请为《GPT Agents In Action》这本书设计封面,使用…… describe_image 技能确保书名拼写正确 封面正确无误。
Please create a cover for the book GPT Agents In Action, use the describe_image skill to make sure the title of the book is spelled correctly on the cover
输入提示后,请稍等片刻,您可能会看到一些关于图像生成和验证过程的对话。最终,如果一切正常,代理将返回如图 4.7 所示的结果。
After the prompt is entered, wait for a while, and you may get to see some dialogue exchanged about the image generation and verification process. In the end, though, if everything works correctly, the agents will return with the results shown in figure 4.7.
令人惊叹的是,代理协调仅用了几次迭代就完成了任务。除了图片之外,您还可以看到为协助完成任务而生成的各种辅助代码片段。AutoGen Studio 令人印象深刻之处在于它能够集成代理可以进一步调整以完成特定目标的技能。下一节将展示这些强大的代理是如何在代码中实现的。
Remarkably, the agent coordination completed the task in just a couple of iterations. Along with the images, you can also see the various helper code snippets generated to assist with task completion. AutoGen Studio is impressive in its ability to integrate skills that the agents can further adapt to complete some goal. The following section will show how these powerful agents are implemented in code.
虽然 AutoGen Studio 是理解多智能体系统的绝佳工具,但我们必须深入研究代码。幸运的是,使用 AutoGen 编写多智能体示例非常简单,而且运行起来也很方便。我们将在下一节介绍 AutoGen 的基本设置。
While AutoGen Studio is a fantastic tool for understanding multi-agent systems, we must look into the code. Fortunately, coding multiple agent examples with AutoGen is simple and easy to run. We’ll cover the basic AutoGen setup in the next section.
接下来的练习将着重讲解如何编写一个使用用户代理和可对话代理的基本多代理系统。不过在此之前,我们需要确保 AutoGen 已正确安装和配置。
This next exercise will look at coding a basic multi-agent system that uses a user proxy and conversable agent. Before we do that, though, we want to make sure AutoGen is installed and configured correctly.
在 VS Code 中打开终端,并按照附录 B 中的第 4 章安装说明进行操作,或者运行pip清单 4.4 中的命令。如果您已安装该requirements.txt文件,则也可以运行 AutoGen。
Open a terminal in VS Code, and run the entire chapter 4 install directions per appendix B, or run the pip command in listing 4.4. If you’ve installed the requirements.txt file, you’ll also be ready to run AutoGen.
使用 pip 安装 pyautogen
pip install pyautogen
接下来,复制该chapter_04/OAI_CONFIG_LIST.example文件OAI_CONFIG_LIST,并从文件名中删除.example。然后,在 VS Code 中打开新文件,并在OAI_CONFIG_LIST清单 4.5 中输入您的 OpenAI 或 Azure 配置。根据您的 API 服务要求填写 API 密钥、模型和其他详细信息。AutoGen 可以与任何符合 OpenAI 客户端规范的模型配合使用。这意味着您可以通过 LM Studio 或其他服务(例如 Groq、Hugging Face 等)使用本地 LLM。
Next, copy the chapter_04/OAI_CONFIG_LIST.example to OAI_CONFIG_LIST, removing .example from the file name. Then, open the new file in VS Code, and enter your OpenAI or Azure configuration in the OAI_CONFIG_LIST file in listing 4.5. Fill in your API key, model, and other details per your API service requirements. AutoGen will work with any model that adheres to the OpenAI client. That means you can use local LLMs via LM Studio or other services such as Groq, Hugging Face, and more.
OAI_CONFIG_LIST [
{
“模型”: “gpt-4” #1
"api_key": "<您的 OpenAI API 密钥>", #2
标签:["gpt-4", "工具"]
},
{
“模型”: “<您的 Azure OpenAI 部署名称>, #3
"api_key": "<在此处填写您的 Azure OpenAI API 密钥>", #4
"base_url": "<您的 Azure OpenAI API 基础 URL>", #5
"api_type": "azure",
"api_version": "2024-02-15-preview"
}
][
{
"model": "gpt-4", #1
"api_key": "<your OpenAI API key here>", #2
"tags": ["gpt-4", "tool"]
},
{
"model": "<your Azure OpenAI deployment name>", #3
"api_key": "<your Azure OpenAI API key here>", #4
"base_url": "<your Azure OpenAI API base here>", #5
"api_type": "azure",
"api_version": "2024-02-15-preview"
}
]
现在,我们可以查看使用现成代理实现的基本多代理聊天程序的代码UserProxy。ConversableAgent请autogen_start.py在 VS Code 中打开以下代码清单,并在运行文件之前先查看各个部分。
Now, we can look at the code for a basic multi-agent chat using the out-of-the-box UserProxy and ConversableAgent agents. Open autogen_start.py in VS Code, shown in the following listing, and review the parts before running the file.
autogen_start.py from autogen import ConversableAgent, UserProxyAgent, config_list_from_json
config_list = config_list_from_json(
env_or_file="OAI_CONFIG_LIST") #1
助手 = 可对话代理(
“代理人”,
llm_config={"config_list": config_list}) #2
用户代理 = UserProxyAgent( #3
“用户”,
代码执行配置={
"work_dir": "工作中",
"use_docker": False,
},
human_input_mode="ALWAYS"
is_termination_msg=lambda x: x.get("content", "")
.rstrip()
.endswith("TERMINATE"), #4
)
user_proxy.initiate_chat(assistant, message="编写解决方案">
↪用一句话概括“fizz buzz?” #5from autogen import ConversableAgent, UserProxyAgent, config_list_from_json
config_list = config_list_from_json(
env_or_file="OAI_CONFIG_LIST") #1
assistant = ConversableAgent(
"agent",
llm_config={"config_list": config_list}) #2
user_proxy = UserProxyAgent( #3
"user",
code_execution_config={
"work_dir": "working",
"use_docker": False,
},
human_input_mode="ALWAYS",
is_termination_msg=lambda x: x.get("content", "")
.rstrip()
.endswith("TERMINATE"), #4
)
user_proxy.initiate_chat(assistant, message="write a solution
↪ for fizz buzz in one line?") #5
在 VS Code 的调试器中运行文件(F5)即可运行代码。清单 4.6 中的代码使用一个简单的任务来演示代码编写。清单 4.7 提供了一些可供选择的示例。这些编码任务也是作者用来评估法学硕士(LLM)编码能力的一些常规基准测试。
Run the code by running the file in VS Code in the debugger (F5). The code in listing 4.6 uses a simple task to demonstrate code writing. Listing 4.7 shows a few examples to choose from. These coding tasks are also some of the author’s regular baselines to assess an LLMs’ strength in coding.
编写一个 Python 函数来检查一个数是否为质数。 使用 Pygame 编写一个经典的 sname 游戏 #1 使用 Pygame 在 Python 中编写一个经典的陨石游戏 #1
write a Python function to check if a number is prime code a classic sname game using Pygame #1 code a classic asteroids game in Python using Pygame #1
代码运行几秒钟后,助手会将解决方案反馈给代理服务器。此时,代理服务器会提示您提供反馈。按回车键,即不提供任何反馈,代理服务器将运行代码以验证其是否按预期运行。
After the code starts in a few seconds, the assistant will respond to the proxy with a solution. At this time, the proxy will prompt you for feedback. Press Enter, essentially giving no feedback, and this will prompt the proxy to run the code to verify it operates as expected.
令人印象深刻的是,代理程序甚至会根据提示安装所需的软件包,例如 Pygame。然后它会运行代码,您将在终端、新窗口或浏览器中看到输出结果。如果代码在新窗口/浏览器中运行,您可以玩游戏或使用界面。
Impressively, the proxy agent will even take cues to install required packages such as Pygame. Then it will run the code, and you’ll see the output in the terminal or as a new window or browser. You can play the game or use the interface if the code shelled a new window/browser.
请注意,在 Windows 系统上,生成的窗口/浏览器不会自动关闭,需要退出整个程序。为避免此问题,请使用 Windows 子系统 Linux 版 (WSL) 或 Docker 运行代码。AutoGen 明确建议使用 Docker 作为代码执行代理,如果您熟悉容器技术,这不失为一个好选择。
Note that the spawned window/browser won’t close on Windows and will require exiting the entire program. To avoid this problem, run the code through Windows Subsystem for Linux (WSL) or Docker. AutoGen explicitly recommends using Docker for code execution agents, and if you’re comfortable with containers, this is a good option.
无论哪种方式,代理生成并运行代码后,working_dir清单 4.6 中指定的文件夹现在应该会包含一个带有代码的 Python 文件。这样您就可以随时运行代码、进行修改,甚至提出改进建议,我们稍后会看到。下一节,我们将探讨如何增强编码代理的功能。
Either way, after the proxy generates and runs the code, the working_dir folder set earlier in listing 4.6 should now have a Python file with the code. This will allow you to run the code at your leisure, make changes, or even ask for improvements, as we’ll see. In the next section, we’ll look at how to improve the capabilities of the coding agents.
多智能体系统的一大优势在于,它可以在完成任务时自动分配多个角色/人物。对于任何开发者autogen_coding_critic.py来说,生成或协助编写代码都是一项极佳的优势,但如果这些代码还能得到审查和测试呢?在接下来的练习中,我们将向智能体系统中添加另一个智能体评论员,以协助完成编码任务。打开如下清单所示的文件。
One powerful benefit of multi-agent systems is the multiple roles/personas you can automatically assign when completing tasks. Generating or helping to write code can be an excellent advantage to any developer, but what if that code was also reviewed and tested? In the next exercise, we’ll add another agent critic to our agent system to help with coding tasks. Open autogen_coding_critic.py, as shown in the following listing.
autogen_coding_critic.py from autogen import AssistantAgent, UserProxyAgent, config_list_from_json
config_list = config_list_from_json(env_or_file="OAI_CONFIG_LIST")
用户代理 = UserProxyAgent(
“用户”,
代码执行配置={
"work_dir": "工作中",
"use_docker": False,
"last_n_messages": 1,
},
human_input_mode="ALWAYS"
is_termination_msg=lambda x:
x.get("content", "").rstrip().endswith("TERMINATE"),
)
工程师 = 助理代理(
姓名="工程师",
llm_config={"config_list": config_list},
system_message=""
您是一位专业的Python工程师,以其在以下领域的专业知识而闻名:
软件开发。
你运用你的技能来创建软件应用程序、工具和
既实用又高效的游戏。
你倾向于编写简洁、结构良好且易于理解的代码。
阅读和维护。
"", #1
)
评论家 = 助理代理(
name="评论者",
llm_config={"config_list": config_list},
system_message=""
您是一位代码审查员,以严谨细致和尽职尽责而闻名。
符合标准。
你的任务是仔细审查代码内容,找出任何有害或
不合格元件。
您确保代码安全、高效并符合最佳实践。
实践。
您将识别代码中存在的任何问题或需要改进的地方。
并将它们作为列表输出。
"", #2
)
def review_code(recipient, messages, sender, config): #3
返回 f"""
请审阅并评价以下代码。
{recipient.chat_messages_for_summary(sender)[-1]['content']}
“””#3
user_proxy.register_nested_chats( #4
[
{
“接收者”:批评家,
"消息": review_code,
"summary_method": "last_msg",
“max_turns”:1,
}
],
触发器=工程师,#4
)
任务 = """使用 Pygame 编写一个贪吃蛇游戏。"""
res = user_proxy.initiate_chat(
收件人=工程师,
消息=任务,
最大转弯数=2,
summary_method="last_msg" #5
)from autogen import AssistantAgent, UserProxyAgent, config_list_from_json
config_list = config_list_from_json(env_or_file="OAI_CONFIG_LIST")
user_proxy = UserProxyAgent(
"user",
code_execution_config={
"work_dir": "working",
"use_docker": False,
"last_n_messages": 1,
},
human_input_mode="ALWAYS",
is_termination_msg=lambda x:
x.get("content", "").rstrip().endswith("TERMINATE"),
)
engineer = AssistantAgent(
name="Engineer",
llm_config={"config_list": config_list},
system_message="""
You are a profession Python engineer, known for your expertise in
software development.
You use your skills to create software applications, tools, and
games that are both functional and efficient.
Your preference is to write clean, well-structured code that is easy
to read and maintain.
""", #1
)
critic = AssistantAgent(
name="Reviewer",
llm_config={"config_list": config_list},
system_message="""
You are a code reviewer, known for your thoroughness and commitment
to standards.
Your task is to scrutinize code content for any harmful or
substandard elements.
You ensure that the code is secure, efficient, and adheres to best
practices.
You will identify any issues or areas for improvement in the code
and output them as a list.
""", #2
)
def review_code(recipient, messages, sender, config): #3
return f"""
Review and critque the following code.
{recipient.chat_messages_for_summary(sender)[-1]['content']}
""" #3
user_proxy.register_nested_chats( #4
[
{
"recipient": critic,
"message": review_code,
"summary_method": "last_msg",
"max_turns": 1,
}
],
trigger=engineer, #4
)
task = """Write a snake game using Pygame."""
res = user_proxy.initiate_chat(
recipient=engineer,
message=task,
max_turns=2,
summary_method="last_msg" #5
)
在 VS Code 中以调试模式运行该autogen_coding_critic.py文件,并观察代理之间的对话。这次,代码执行完毕后,评论器也会被触发并做出响应。然后,评论器会添加注释和建议来改进代码。
Run the autogen_coding_critic.py file in VS Code in debug mode, and watch the dialog between the agents. This time, after the code returns, the critic will also be triggered to respond. Then, the critic will add comments and suggestions to improve the code.
嵌套聊天在支持和控制客服人员交互方面效果不错,但我们将在下一节中介绍一种更好的方法。不过在此之前,我们将在下一节回顾一下自动生成缓存的重要性。
Nested chats work well for supporting and controlling agent interactions, but we’ll see a better approach in the following section. Before that though, we’ll review the importance of the AutoGen cache in the next section.
作为可交互的多代理平台,AutoGen 在多次聊天迭代中会消耗大量令牌。如果您要求 AutoGen 处理复杂或新颖的问题,您的 LLM 甚至可能会遇到令牌限制;因此,AutoGen 支持多种方法来减少令牌使用量。
AutoGen can consume many tokens over chat iterations as a conversable multi-agent platform. If you ask AutoGen to work through complex or novel problems, you may even encounter token limits on your LLM; because of this, AutoGen supports several methods to reduce token usage.
AutoGen 使用缓存来存储进度并减少令牌使用量。缓存默认启用,您可能已经见过它。如果您查看当前工作文件夹,会发现一个.cache文件夹,如图 4.8 所示。缓存允许代理在对话中断后继续对话。
AutoGen uses caching to store progress and reduce token usage. Caching is enabled by default, and you may have already encountered it. If you check your current working folder, you’ll notice a .cache folder, as shown in figure 4.8. Caching allows your agents to continue conversations if they get interrupted.
在代码中,您可以控制代理运行的缓存文件夹,如清单 4.9 所示。通过将调用包装initiate_chat在with语句中,您可以控制缓存的位置和种子值。这样,您只需设置先前缓存的种子值,即可保存并返回到将来运行时间较长的 AutoGen 任务cache_seed。
In code, you can control the cache folder for your agent’s run, as shown in listing 4.9. By wrapping the initiate_chat call with the with statement, you can control the location and seed for the cache. This will allow you to save and return to long-running AutoGen tasks in the future by just setting the cache_seed for the previous cache.
使用 Cache.disk(cache_seed=42) 作为缓存: #1
res = user_proxy.initiate_chat(
收件人=工程师,
消息=任务,
最大转弯数=2,
summary_method="last_msg",
cache=cache, #2
)with Cache.disk(cache_seed=42) as cache: #1
res = user_proxy.initiate_chat(
recipient=engineer,
message=task,
max_turns=2,
summary_method="last_msg",
cache=cache, #2
)
这种缓存功能允许您从之前的缓存位置继续操作,并捕获之前的运行结果。它也是演示和检查客服对话如何生成结果的绝佳方式。下一节,我们将探讨 AutoGen 支持的另一种对话模式——群聊。
This caching ability allows you to continue operations from the previous cache location and captures previous runs. It can also be a great way to demonstrate and inspect how an agent conversation generated the results. In the next section, we’ll look at another conversational pattern in which AutoGen supports group chat.
聊天委托和嵌套聊天或对话的一个问题是信息传递。如果你玩过“传话游戏”,你就会亲身体验到信息在多次传递过程中变化之快。对于智能代理来说,情况也完全一样,通过嵌套或顺序对话进行聊天可能会改变任务,甚至改变预期结果。
One problem with chat delegation and nested chats or conversations is the conveyance of information. If you’ve ever played the telephone game, you’ve witnessed this firsthand and experienced how quickly information can change over iterations. With agents, this is certainly no different, and chatting through nested or sequential conversations can alter the task or even the desired result.
图 4.9 展示了嵌套式群聊和协作式群聊之间的区别。我们在上一节中使用了嵌套式聊天功能构建了一个嵌套式代理聊天。在本节中,我们使用群聊来提供更具协作性的体验。
Figure 4.9 shows the difference between nested and collaborative group chats. We used the nested chat feature in the previous section to build a nested agent chat. In this section, we use the group chat to provide a more collaborative experience.
打开autogen_coding_group.py相关部分,如清单 4.10 所示。代码与之前的练习类似,但现在引入了GroupChat`and` 和 `.` GroupChatManager。代理和消息都保存在群聊中,类似于 Slack 或 Discord 等应用程序中的消息频道。聊天管理器负责协调消息回复,以减少对话重叠。
Open autogen_coding_group.py with relevant parts, as shown in listing 4.10. The code is similar to the previous exercise but now introduces GroupChat and GroupChatManager. The agents and messages are held with the group chat, similar to a messaging channel in applications such as Slack or Discord. The chat manager coordinates the message responses to reduce conversation overlap.
autoget_coding_group.py(相关章节)用户代理 = UserProxyAgent(
“用户”,
代码执行配置={
"work_dir": "工作中",
"use_docker": False,
"last_n_messages": 3,
},
human_input_mode="NEVER" #1
)
llm_config = {"config_list": config_list}
工程师 = 助理代理(… #2
评论家 = 助理代理(… #2
群聊 = 群聊(代理=[user_proxy,
工程师,
评论家],
messages=[],
max_round=20) #3
管理器 = GroupChatManager(groupchat=groupchat,
llm_config=llm_config) #4
任务 = """使用 Pygame 编写一个贪吃蛇游戏。"""
使用 Cache.disk(cache_seed=43) 作为缓存:
res = user_proxy.initiate_chat(
收件人=经理,
消息=任务,
cache=cache,
)user_proxy = UserProxyAgent(
"user",
code_execution_config={
"work_dir": "working",
"use_docker": False,
"last_n_messages": 3,
},
human_input_mode="NEVER", #1
)
llm_config = {"config_list": config_list}
engineer = AssistantAgent(… #2
critic = AssistantAgent(… #2
groupchat = GroupChat(agents=[user_proxy,
engineer,
critic],
messages=[],
max_round=20) #3
manager = GroupChatManager(groupchat=groupchat,
llm_config=llm_config) #4
task = """Write a snake game using Pygame."""
with Cache.disk(cache_seed=43) as cache:
res = user_proxy.initiate_chat(
recipient=manager,
message=task,
cache=cache,
)
运行此练习,您将看到代理如何协作。工程师现在将接收评论者的反馈,并采取相应措施来解决评论者的建议。这也使得代理能够参与到所有对话中。
Run this exercise, and you’ll see how the agents collaborate. The engineer will now take feedback from the critic and undertake operations to address the critic’s suggestions. This also allows the proxy to engage in all of the conversation.
群组对话是增强智能体协作完成任务能力的绝佳方式。然而,群组对话也更加冗长,且令牌消耗量更大。当然,随着层级逻辑模型(LLM)的成熟,其上下文令牌窗口的大小和令牌处理成本也会随之增加。随着令牌窗口的增大,对令牌消耗的担忧最终可能会消失。
Group conversations are an excellent way to strengthen your agents’ abilities as they collaborate on tasks. However, they are also substantially more verbose and token expensive. Of course, as LLMs mature, so do the size of their context token windows and the price of token processing. As token windows increase, concerns over token consumption may eventually go away.
AutoGen 是一个功能强大的多智能体平台,可通过 Web 界面或代码进行体验。无论您偏好哪种方式,这款智能体协作工具都是构建代码或其他复杂任务的理想平台。当然,它并非唯一的平台,您将在下一节中看到,我们将探讨一个名为 CrewAI 的新兴平台。
AutoGen is a powerful multi-agent platform that can be experienced using a web interface or code. Whatever your preference, this agent collaboration tool is an excellent platform for building code or other complex tasks. Of course, it isn’t the only platform, as you’ll see in the next section, where we explore a newcomer called CrewAI.
CrewAI 在多智能体系统领域相对较新。AutoGen 最初是基于研究开发并逐步扩展的,而 CrewAI 则是专为企业系统而设计的。因此,该平台更加稳健,但也因此在某些方面的扩展性较差。
CrewAI is relatively new to the realm of multi-agent systems. Where AutoGen was initially developed from research and then extended, CrewAI is built with enterprise systems in mind. As such, the platform is more robust, making it less extensible in some areas.
CrewAI 允许您构建一个代理团队,使其专注于完成特定任务目标。与 AutoGen 不同,CrewAI 不需要使用用户代理,而是假定代理之间仅相互协作。
With CrewAI, you build a crew of agents to focus on specific areas of a task goal. Unlike AutoGen, CrewAI doesn’t require the use of the user proxy agent but instead assumes the agents only work among themselves.
图 4.10 展示了 CrewAI 平台的主要组成部分、它们之间的连接方式以及主要功能。图中所示为一个顺序处理代理系统,包含通用研究员和撰稿人代理。代理会被分配任务,这些任务可能还包括辅助工具或内存。
Figure 4.10 shows the main elements of the CrewAI platform, how they connect together, and their primary function. It shows a sequential-processing agent system with generic researcher and writer agents. Agents are assigned tasks that may also include tools or memory to assist them.
CrewAI 支持两种主要的处理方式:顺序处理和层级处理。图 4.10 展示了顺序处理过程,它遍历给定的智能体及其相关任务。下一节,我们将深入研究一些代码,以设置一个团队并利用它完成一个目标,从而创作一个好笑的笑话。
CrewAI supports two primary forms of processing: sequential and hierarchical. Figure 4.10 shows the sequential process by iterating across the given agents and their associated tasks. In the next section, we dig into some code to set up a crew and employ it to complete a goal and create a good joke.
CrewAI 的设置比 AutoGen 更复杂,但也因此提供了更强大的控制功能和额外的指导,能够提供更具体的上下文信息来指导智能体完成给定的任务。虽然这并非完美无缺,但它的确比 AutoGen 提供了更强大的开箱即用控制功能。
CrewAI requires more setup than AutoGen, but this also allows for more control and additional guides, which provide more specific context to guide the agents in completing the given task. This isn’t without problems, but it does offer more control than AutoGen out of the box.
在 VS Code 中打开crewai_introduction.py并查看顶部部分,如清单 4.11 所示。配置代理需要许多设置,包括角色、目标、详细程度、内存、背景故事、委托,甚至还有工具(未显示)。在本例中,我们使用了两个代理:一位资深笑话研究员和一位笑话撰写者。
Open crewai_introduction.py in VS Code and look at the top section, as shown in listing 4.11. Many settings are required to configure an agent, including the role, goal, verboseness, memory, backstory, delegation, and even tools (not shown). In this example, we’re using two agents: a senior joke researcher and a joke writer.
crewai_introduction.py(经纪人部分)from crewai import Agent, Crew, Process, Task
from dotenv import load_dotenv
load_dotenv()
joke_researcher = Agent( #1
role="高级笑话研究员",
目标="研究以下{主题}中哪些事情令人发笑",
verbose=True, #2
memory=True, #3
背景故事=( #4
“你深受滑稽幽默的启发,是一位经验丰富的笑话研究者。”
谁知道什么能逗人发笑。你在这方面很有天赋。
“日常生活中那些有趣的事情,能把沉闷的时刻变成……”
“笑料百出。”
),
allow_delegation=True, #5
)
笑话撰写者 = Agent( #6
角色“笑话作家”,
目标="就以下{主题}写一个幽默有趣的笑话",
verbose=True, #7
memory=True, #8
背景故事=( #9
“你是一位很有幽默天赋的笑话作家。你可以把……”
“一个简单的想法,却能引发一场爆笑盛宴。你真会说话。”
“只需几句台词就能逗人发笑。”
),
allow_delegation=False,#5
)from crewai import Agent, Crew, Process, Task
from dotenv import load_dotenv
load_dotenv()
joke_researcher = Agent( #1
role="Senior Joke Researcher",
goal="Research what makes things funny about the following {topic}",
verbose=True, #2
memory=True, #3
backstory=( #4
"Driven by slapstick humor, you are a seasoned joke researcher"
"who knows what makes people laugh. You have a knack for finding"
"the funny in everyday situations and can turn a dull moment into"
"a laugh riot."
),
allow_delegation=True, #5
)
joke_writer = Agent( #6
role="Joke Writer",
goal="Write a humourous and funny joke on the following {topic}",
verbose=True, #7
memory=True, #8
backstory=( #9
"You are a joke writer with a flair for humor. You can turn a"
"simple idea into a laugh riot. You have a way with words and"
"can make people laugh with just a few lines."
),
allow_delegation=False, #5
)
继续向下看代码,我们接下来会看到任务,如清单 4.12 所示。任务表示代理完成系统主要目标的过程。它们还将代理与特定任务关联起来,定义该任务的输出,并且可能包括任务的执行方式。
Moving down the code, we next see the tasks, as shown in listing 4.12. Tasks denote an agent’s process to complete the primary system goal. They also link an agent to work on a specific task, define the output from that task, and may include how it’s executed.
crewai_introduction.py(任务部分)研究任务 = 任务( #1
描述=(
“找出以下话题:{topic} 如此有趣的原因。”
“一定要包含使其幽默的关键要素。”
此外,请对当前的社会趋势进行分析。
“以及它如何影响人们对幽默的感知。”
),
预期输出:“一份包含三段内容的综合报告”
↪ 关于最新的笑话。” #2
agent=joke_researcher, #3
)
write_task = Task( #4
描述=(
请围绕{主题}创作一个有见地、幽默且具有社会意识的笑话。
“一定要包含让它变得有趣的关键要素,而且……”
“与当前社会趋势相关。”
),
预期输出:“关于{topic}的一个笑话。” #5
代理人=笑话作家,#3
async_execution=False, #6
输出文件="the_best_joke.md", #7
)research_task = Task( #1
description=(
"Identify what makes the following topic:{topic} so funny."
"Be sure to include the key elements that make it humourous."
"Also, provide an analysis of the current social trends,"
"and how it impacts the perception of humor."
),
expected_output="A comprehensive 3 paragraphs long report
↪ on the latest jokes.", #2
agent=joke_researcher, #3
)
write_task = Task( #4
description=(
"Compose an insightful, humourous and socially aware joke on {topic}."
"Be sure to include the key elements that make it funny and"
"relevant to the current social trends."
),
expected_output="A joke on {topic}.", #5
agent=joke_writer, #3
async_execution=False, #6
output_file="the_best_joke.md", #7
)
现在,我们可以看到所有内容是如何组合在一起的,Crew如清单 4.13 所示,位于文件底部。同样,在构建时可以设置许多选项Crew,包括代理、任务、进程类型、内存、缓存、每分钟最大请求数(max_rpm)以及船员是否共享。
Now, we can see how everything comes together as the Crew at the bottom of the file, as shown in listing 4.13. Again, many options can be set when building the Crew, including the agents, tasks, process type, memory, cache, maximum requests per minute (max_rpm), and whether the crew shares.
crewai_introduction.py(船员部分)船员 = 船员(
代理=[笑话研究员,笑话撰写者], #1
tasks=[研究任务,写作任务] #2
process=Process.sequential, #3
memory=True, #4
cache=True, #5
最大转速=100, #6
share_crew=True, #7
)
result = crew.kickoff(inputs={"topic": "AI工程师笑话"})
打印(结果)crew = Crew(
agents=[joke_researcher, joke_writer], #1
tasks=[research_task, write_task], #2
process=Process.sequential, #3
memory=True, #4
cache=True, #5
max_rpm=100, #6
share_crew=True, #7
)
result = crew.kickoff(inputs={"topic": "AI engineer jokes"})
print(result)
审阅完毕后,在 VS Code 中运行文件(F5),并观察终端中来自团队成员的对话和消息。正如你现在可能已经了解到的,这个代理系统的目标是生成与人工智能工程相关的笑话。以下是代理系统运行几次后生成的一些比较有趣的笑话:
When you’re done reviewing, run the file in VS Code (F5), and watch the terminal for conversations and messages from the crew. As you can probably tell by now, the goal of this agent system is to craft jokes related to AI engineering. Here are some of the funnier jokes generated over a few runs of the agent system:
在运行更多次“玩笑小队”测试之前,你应该先阅读下一节。本节将介绍如何为多智能体系统添加可观测性。
Before you run more iterations of the joke crew, you should read the next section. This section shows how to add observability to the multi-agent system.
观察诸如多智能体系统之类的复杂集合体对于理解可能出现的各种问题至关重要。通过应用追踪实现可观测性是任何复杂系统的关键要素,尤其对于企业级应用系统而言更是如此。
Observing a complex assemblage such as a multi-agent system is critical to understanding the myriad of problems that can happen. Observability through application tracing is a key element of any complex system, especially one engaged in enterprise use.
CrewAI 支持连接到一个名为 AgentOps 的专用代理运维平台。该可观测性平台是通用的,旨在支持任何 LLM 专用代理平台的可观测性。目前,尚未公布定价或商业化详情。
CrewAI supports connecting to a specialized agent operations platform appropriately called AgentOps. This observability platform is generic and designed to support observability with any agent platform specific to LLM usage. Currently, no pricing or commercialization details are available.
连接到 AgentOps 非常简单,只需安装软件包、获取 API 密钥,然后在您的团队配置中添加一行代码即可。接下来的练习将详细介绍连接和运行 AgentOps 的步骤。
Connecting to AgentOps is as simple as installing the package, getting an API key, and adding a line of code to your crew setup. This next exercise will go through the steps to connect and run AgentOps.
清单 4.14 展示了agentops如何使用该软件包进行安装pip。您可以单独安装该软件包,也可以将其作为附加组件安装crewai。请记住,AgentOps 还可以连接到其他代理平台以实现可观测性。
Listing 4.14 shows installing the agentops package using pip. You can install the package alone or as an additional component of the crewai package. Remember that AgentOps can also be connected to other agent platforms for observability.
pip 安装 agentops 或者作为 CrewAI 的一个选项 pip install crewai[agentops]
pip install agentops or as an option with CrewAI pip install crewai[agentops]
使用 AgentOps 之前,您需要注册一个 API 密钥。以下是撰写本文时注册密钥的一般步骤:
Before using AgentOps, you need to sign up for an API key. Following are the general steps to sign up for a key at the time of writing:
.env项目文件中。.env file in your project. 复制 API 密钥后,它应该类似于下面列表中显示的示例。
After the API key is copied, it should resemble the example shown in the following listing.
env.:添加 AgentOps 密钥AGENTOPS_API_KEY="您的 API 密钥"
AGENTOPS_API_KEY="your API key"
现在,我们需要向 CrewAI 脚本添加几行代码。清单 4.16 显示了添加到crewai_agentops.py文件中的代码。创建自己的脚本时,只需添加agentops相应的包,并在使用 CrewAI 时对其进行初始化即可。
Now, we need to add a few lines of code to the CrewAI script. Listing 4.16 shows the additions as they are added to the crewai_agentops.py file. When creating your own scripts, all you need to do is add the agentops package and initialize it when using CrewAI.
crewai_agentops.py(AgentOps 新增功能)导入 agentops #1 from crewai import Agent, Crew, Process, Task from dotenv import load_dotenv load_dotenv() agentops.init() #2
import agentops #1 from crewai import Agent, Crew, Process, Task from dotenv import load_dotenv load_dotenv() agentops.init() #2
在 VS Code 中运行该crewai_agentops.py文件(按 F5),即可像以前一样观察代理的运行情况。但现在,您可以访问 AgentOps 控制面板,查看各个层级的代理交互。
Run the crewai_agentops.py file in VS Code (F5), and watch the agents work as before. However, you can now go to the AgentOps dashboard and view the agent interactions at various levels.
图 4.11 显示了用于运行笑话团队以创作最佳笑话的仪表板。其中统计了多项数据,包括总时长、运行环境、提示和完成令牌、LLM 调用时间以及预估成本。查看成本数据既令人警醒,也表明了代理对话可能会变得多么冗长。
Figure 4.11 shows the dashboard for running the joke crew to create the best joke. Several statistics include total duration, the run environment, prompt and completion tokens, LLM call timings, and estimated cost. Seeing the cost can be both sobering and indicative of how verbose agent conversations can become.
AgentOps平台是任何智能体平台的绝佳补充。虽然它已内置于CrewAI中,但其可观测性功能也可添加到AutoGen或其他框架中,这一点非常实用。AgentOps的另一个吸引人之处在于,它专注于观察智能体交互,而非由机器学习运维平台转型而来。未来,我们很可能会看到更多智能体可观测性模式的涌现。
The AgentOps platform is an excellent addition to any agent platform. While it’s built into CrewAI, it’s helpful that the observability could be added to AutoGen or other frameworks. Another attractive thing about AgentOps is that it’s dedicated to observing agent interactions and not transforming from a machine learning operations platform. In the future, we’ll likely see the spawn of more agent observability patterns.
可观测性平台能够提供的成本观测功能,其优势不容忽视。您是否注意到图 4.11 显示,创建一个笑话的成本略高于 50 美分?智能体功能强大,但成本也可能非常高昂,因此,从实用性和商业化的角度来看,了解这些成本至关重要。
One benefit that can’t be overstated is the cost observation that an observability platform can provide. Did you notice in figure 4.11 that creating a single joke costs a little over 50 cents? Agents can be very powerful, but they can also become very costly, and it’s essential to observe what those costs are in terms of practicality and commercialization.
在本章最后一节中,我们将回到 CrewAI,并再次探讨如何构建能够编写游戏代码的智能体。这将为 AutoGen 和 CrewAI 的功能提供一个极佳的对比。
In the last section of this chapter, we’ll return to CrewAI and revisit building agents that can code games. This will provide an excellent comparison between the capabilities of AutoGen and CrewAI.
比较多智能体平台性能的一个好方法是让机器人执行类似的任务。在接下来的练习中,我们将使用 CrewAI 来模拟游戏编程团队。当然,这种方法也可以应用于其他编程任务。
A great way to compare capabilities between multi-agent platforms is to implement similar tasks in a bot. In this next set of exercises, we’ll employ CrewAI as a game programming team. Of course, this could be adapted to other coding tasks as well.
在 VS Code 中打开crewai_coding_crew.py,我们首先查看清单 4.17 中的代理部分。在这里,我们将创建一个高级工程师、一个 QA 工程师和一个首席 QA 工程师,并赋予他们角色、目标和背景故事。
Open crewai_coding_crew.py in VS Code, and we’ll first review the agent section in listing 4.17. Here, we’re creating a senior engineer, a QA engineer, and a chief QA engineer with a role, goal, and backstory.
crewai_coding_crew.py(经纪人部分)print("## 欢迎加入游戏团队") #1
打印(” - - - - - - - - - - - - - - - -”)
game = input("你想开发什么类型的游戏?"
↪具体机制是什么?\n")
高级工程师代理 = 代理(
role="高级软件工程师",
目标“根据需要创建软件”,
背景故事=dedent(
"""
您是一家领先科技智库的高级软件工程师。
您在Python编程方面拥有丰富的经验,请尽您所能。
生成完美代码
"""
),
allow_delegation=False,
verbose=True,
)
qa_engineer_agent = Agent(
角色“软件质量控制工程师”
目标:通过分析代码创建完美代码
↪这是为错误而给出的”,
背景故事=dedent(
"""
你是一名专门从事代码检查的软件工程师。
你善于发现错误。你观察力敏锐,注重细节,并且有发现问题的天赋。
隐藏的漏洞。
您需要检查是否存在缺失的导入、变量声明和不匹配的情况。
括号和语法错误。
您还需要检查安全漏洞和逻辑错误。
"""
),
allow_delegation=False,
verbose=True,
)
chief_qa_engineer_agent = Agent(
role="首席软件质量控制工程师",
目标:“确保代码完成它应该完成的工作”。
背景故事=dedent(
"""
您是一家领先企业的首席软件质量控制工程师。
技术智库。您负责确保代码
所写的内容能够完成它应该完成的工作。
您负责检查代码错误并确保其正常运行。
它的质量非常高。
"""
),
allow_delegation=True, #2
verbose=True,
)print("## Welcome to the Game Crew") #1
print("-------------------------------")
game = input("What is the game you would like to build?
↪ What will be the mechanics?\n")
senior_engineer_agent = Agent(
role="Senior Software Engineer",
goal="Create software as needed",
backstory=dedent(
"""
You are a Senior Software Engineer at a leading tech think tank.
Your expertise in programming in python. and do your best to
produce perfect code
"""
),
allow_delegation=False,
verbose=True,
)
qa_engineer_agent = Agent(
role="Software Quality Control Engineer",
goal="create prefect code, by analizing the code
↪ that is given for errors",
backstory=dedent(
"""
You are a software engineer that specializes in checking code
for errors. You have an eye for detail and a knack for finding
hidden bugs.
You check for missing imports, variable declarations, mismatched
brackets and syntax errors.
You also check for security vulnerabilities, and logic errors
"""
),
allow_delegation=False,
verbose=True,
)
chief_qa_engineer_agent = Agent(
role="Chief Software Quality Control Engineer",
goal="Ensure that the code does the job that it is supposed to do",
backstory=dedent(
"""
You are a Chief Software Quality Control Engineer at a leading
tech think tank. You are responsible for ensuring that the code
that is written does the job that it is supposed to do.
You are responsible for checking the code for errors and ensuring
that it is of the highest quality.
"""
),
allow_delegation=True, #2
verbose=True,
)
向下滚动文件即可显示代理任务,如清单 4.18 所示。任务描述和预期输出应该很容易理解。同样,每个代理都有特定的任务,以便在完成任务时提供更好的上下文信息。
Scrolling down in the file will display the agent tasks, as shown in listing 4.18. The task descriptions and expected output should be easy to follow. Again, each agent has a specific task to provide better context when working to complete the task.
crewai_coding_crew.py(任务部分)code_task = Task(
描述=f"""
你将使用 Python 创建一个游戏,以下是操作说明:
指示
------------
{游戏} #1
你将使用 Python 编写游戏代码。“"""
预期输出:“您的最终答案必须是
↪完整的 Python 代码,只有 Python 代码,没有其他任何东西。
agent=高级工程师代理,
)
qa_task = Task(
description=f"""你正在帮助创建一个游戏
以下是使用 Python 的说明:
指示
------------
游戏 #1
使用你得到的代码,检查错误。检查逻辑错误。
语法错误、缺少导入语句、缺少变量声明
括号不匹配,
以及安全漏洞。”
expected_output="输出您在代码中发现的问题列表。"
agent=qa_engineer_agent,
)
评估任务 = 任务(
description=f"""你正在帮助创建一个游戏
以下是使用 Python 的说明:
指示
------------
游戏 #1
您将检查代码,以确保其完整性,并且
它能完成它应该完成的工作。
预期输出:“您的最终答案必须是
↪修正了一整段 Python 代码,只有 Python 代码,没有其他任何改动。
agent=首席质量保证工程师代理
)code_task = Task(
description=f"""
You will create a game using python, these are the instructions:
Instructions
------------
{game} #1
You will write the code for the game using python.""",
expected_output="Your Final answer must be the
↪ full python code, only the python code and nothing else.",
agent=senior_engineer_agent,
)
qa_task = Task(
description=f"""You are helping create a game
↪ using python, these are the instructions:
Instructions
------------
{game} #1
Using the code you got, check for errors. Check for logic errors,
syntax errors, missing imports, variable declarations,
mismatched brackets,
and security vulnerabilities.""",
expected_output="Output a list of issues you found in the code.",
agent=qa_engineer_agent,
)
evaluate_task = Task(
description=f"""You are helping create a game
↪ using python, these are the instructions:
Instructions
------------
{game} #1
You will look over the code to insure that it is complete and
does the job that it is supposed to do. """,
expected_output="Your Final answer must be the
↪ corrected a full python code, only the python code and nothing else.",
agent=chief_qa_engineer_agent,
)
最后,我们可以通过查看文件末尾的内容(如清单 4.19 所示)来了解整个过程。这种人员配置与我们之前看到的非常相似。每个代理和任务,以及详细模式和进程属性都会被添加进去。在这个例子中,我们将继续使用顺序方法。
Finally, we can see how this comes together by going to the bottom of the file, as shown in listing 4.19. This crew configuration is much like what we’ve seen before. Each agent and task are added, as well as the verbose and process attributes. For this example, we’ll continue to use sequential methods.
crewai_coding_crew.py(船员部分)船员 = 船员(
代理=[高级工程师代理,
qa_engineer_agent,
首席质量保证工程师代理],
tasks=[code_task, qa_task, evaluate_task],
详细模式=2,
process=Process.sequential, #1
)
让你的团队开始工作!
结果 = crew.kickoff() #2
打印(”######################”)
打印(结果)crew = Crew(
agents=[senior_engineer_agent,
qa_engineer_agent,
chief_qa_engineer_agent],
tasks=[code_task, qa_task, evaluate_task],
verbose=2,
process=Process.sequential, #1
)
# Get your crew to work!
result = crew.kickoff() #2
print("######################")
print(result)
运行 VS Code (F5) 文件后,系统会提示您输入游戏指令。输入一些指令,例如贪吃蛇游戏或其他您选择的游戏。然后,让程序运行,并观察它们的运行结果。
When you run the VS Code (F5) file, you’ll be prompted to enter the instructions for writing a game. Enter some instructions, perhaps the snake game or another game you choose. Then, let the agents work, and observe what they produce.
有了首席质量保证工程师的加入,最终结果通常会比使用 AutoGen 生成的更好,至少在开箱即用的情况下是如此。如果您查看代码,会发现它通常遵循良好的模式,在某些情况下,甚至可能包含测试用例和单元测试。
With the addition of the chief QA engineer, the results will generally look better than what was produced with AutoGen, at least out of the box. If you review the code, you’ll see that it generally follows good patterns and, in some cases, may even include tests and unit tests.
在本章结束之前,我们将对船员的处理模式进行最后一次更改。之前,我们采用的是顺序处理,如图 4.10 所示。图 4.12 展示了 CrewAI 中的分层处理方式。
Before we finish the chapter, we’ll make one last change to the crew’s processing pattern. Previously, we employed sequential processing, as shown in figure 4.10. Figure 4.12 shows what hierarchical processing looks like in CrewAI.
添加此管理器是一个相对简单的过程。清单 4.20 显示了对新文件的额外代码更改,该文件以分层方法使用编码团队。除了从 LangChain 导入用于连接到 OpenAI 的类之外,另一个新增内容是将此类添加为团队管理器manager_llm。
Adding this manager is a relatively simple process. Listing 4.20 shows the additional code changes to a new file that uses the coding crew in a hierarchical method. Aside from importing a class for connecting to OpenAI from LangChain, the other addition is adding this class as the crew manger, manager_llm.
crewai_hierarchy.py(船员经理部分)from langchain_openai import ChatOpenAI #1
船员 = 船员(
代理=[高级工程师代理,
qa_engineer_agent,
首席质量保证工程师代理],
tasks=[code_task, qa_task, evaluate_task],
详细模式=2,
process=Process.hierarchical, #2
manager_llm=ChatOpenAI( #3
temperature=0, model="gpt-4" #3
), #4
)#4from langchain_openai import ChatOpenAI #1
crew = Crew(
agents=[senior_engineer_agent,
qa_engineer_agent,
chief_qa_engineer_agent],
tasks=[code_task, qa_task, evaluate_task],
verbose=2,
process=Process.hierarchical, #2
manager_llm=ChatOpenAI( #3
temperature=0, model="gpt-4" #3
), #4
) #4
在 VS Code 中运行此文件(按 F5)。出现提示时,输入您想要创建的游戏。尝试使用您之前用 AutoGen 尝试过的同一个游戏;贪吃蛇游戏也是一个很好的基准示例。观察智能体如何运行代码,并反复检查代码以发现问题。
Run this file in VS Code (F5). When prompted, enter a game you want to create. Try using the same game you tried with AutoGen; the snake game is also a good baseline example. Observe the agents work through the code and review it repeatedly for problems.
运行文件后,您还可以登录 AgentOps 查看此次运行的成本。很可能,使用代理管理器后,成本会是不使用代理管理器时的两倍以上。而且,输出结果可能也不会有显著提升。这就是在不了解问题会如何迅速恶化的情况下构建代理系统的陷阱。
After you run the file, you can also jump on AgentOps to review the cost of this run. Chances are, it will cost over double what it would have without the agent manager. The output will also likely not be significantly better. This is the trap of building agent systems without understanding how quickly things can spiral.
这种螺旋式下降的典型例子是,当智能体不断重复执行相同的操作时,就会出现频繁重复的任务。您可以在 AgentOps 中查看此问题,如图 4.13 所示,方法是查看“重复思考”图。
An example of this spiral that often happens when agents continually iterate over the same actions is frequently repeating tasks. You can view this problem in AgentOps, as shown in figure 4.13, by viewing the Repeat Thoughts plot.
AgentOps 中的“重复思维”图表是衡量智能体系统重复性的绝佳工具。过度重复的思维模式通常意味着智能体不够果断,而是不断尝试生成不同的答案。如果遇到此类问题,则需要更改智能体的处理模式、任务和目标。您甚至可能需要更改系统的智能体类型和数量。
The Repeat Thoughts plot from AgentOps is an excellent way to measure the repetition your agent system encounters. Overly repetitive thought patterns typically mean the agent isn’t being decisive enough and instead keeps trying to generate a different answer. If you encounter this problem, you want to change the agents’ processing patterns, tasks, and goals. You may even want to alter the system’s type and number of agents.
多智能体系统是按工作模式和任务划分工作的绝佳方式。通常,工作角色会分配给一个智能体角色/人物,而它需要完成的任务可能是隐式的(如 AutoGen),也可能是显式的(如 CrewAI)。
Multi-agent systems are an excellent way to break up work in terms of work patterns of jobs and tasks. Generally, the job role is allocated to an agent role/persona, and the tasks it needs to complete may be implicit, as in AutoGen, or more explicit, as in CrewAI.
本章介绍了许多实用工具和平台,您可以立即使用它们来改善工作、生活等等。至此,我们对多智能体平台的探索之旅告一段落,但这并不意味着我们对多智能体的探索和使用就此结束,我们将在后续章节中继续深入探讨。
In this chapter, we covered many useful tools and platforms that you can use right away to improve your work, life, and more. That completes our journey through multi-agent platforms, but it doesn’t conclude our exploration and use of multiple agents, as we’ll discover in later chapters.
通过以下练习来加深你对相关内容的理解:
Use the following exercises to improve your knowledge of the material:
目标 ——熟悉 AutoGen 中的基本代理通信和设置。
Objective —Familiarize yourself with basic agent communications and setup in AutoGen.
任务:
Tasks:
目标 ——通过增加高级技能来增强代理能力。
Objective —Enhance agent capabilities by adding advanced skills.
任务:
Tasks:
目标 ——探索 CrewAI 中基于角色的任务管理。
Objective —Explore role-based task management in CrewAI.
任务:
Tasks:
目标 ——了解并实施 AutoGen 中的群聊系统,以促进代理协作。
Objective —Understand and implement a group chat system in AutoGen to facilitate agent collaboration.
任务:
Tasks:
目标 ——在 CrewAI 环境中,使用 AgentOps 实现和评估代理的可观测性。
Objective —Implement and evaluate the observability of agents using AgentOps in a CrewAI environment.
任务:
Tasks:
本章我们将探讨如何通过函数实现动作,以及智能体如何使用这些函数。我们将首先了解 OpenAI 的函数调用,然后快速过渡到微软的另一个项目——语义内核 (Semantic Kernel,简称 SK)。我们将使用 SK 来构建和管理智能体的技能和函数,或者让智能体自身也具备这些技能和函数。
In this chapter, we explore actions through the use of functions and how agents can use them as well. We’ll start by looking at OpenAI function calling and then quickly move on to another project from Microsoft called Semantic Kernel (SK), which we’ll use to build and manage skills and functions for agents or as agents.
本章最后,我们将使用 SK 来托管我们的第一个代理系统。本章内容完整,包含大量带注释的代码示例。
We’ll finish the chapter using SK to host our first agent system. This will be a complete chapter with plenty of annotated code examples.
ChatGPT插件最初是为了给用户会话添加功能、技能或工具而推出的。借助插件,用户可以搜索网络、创建电子表格或图表。插件为ChatGPT提供了扩展平台功能的途径。
ChatGPT plugins were first introduced to provide a session with abilities, skills, or tools. With a plugin, you can search the web or create spreadsheets or graphs. Plugins provide ChatGPT with the means to extend the platform.
图 5.1 展示了 ChatGPT 插件的工作原理。在本例中,ChatGPT 中安装了一个新的电影推荐插件。当用户请求 ChatGPT 推荐一部新电影时,大型语言模型 (LLM) 会识别出它有一个插件来处理该操作。然后,它会将用户请求分解成可执行的参数,并将这些参数传递给新的电影推荐器。
Figure 5.1 shows how a ChatGPT plugin works. In this example, a new movie recommender plugin has been installed in ChatGPT. When a user asks ChatGPT to recommend a new movie, the large language model (LLM) recognizes that it has a plugin to manage that action. It then breaks down the user request into actionable parameters, which it passes to the new movie recommender.
推荐器随后抓取展示新电影的网站,并将该信息附加到发送给 LLM 的新提示请求中。LLM 收到此信息后,会向推荐器返回响应,推荐器再将响应传递给 ChatGPT。ChatGPT 随后会向用户返回推荐请求。
The recommender then scrapes a website showcasing new movies and appends that information to a new prompt request to an LLM. With this information, the LLM responds to the recommender, which passes this back to ChatGPT. ChatGPT then responds to the user with the recommended request.
我们可以将插件视为操作的代理。一个插件通常封装了一项或多项功能,例如调用 API 或抓取网站数据。因此,操作是插件的扩展——它们赋予插件相应的功能。
We can think of plugins as proxies for actions. A plugin generally encapsulates one or more abilities, such as calling an API or scraping a website. Actions, therefore, are extensions of plugins—they give a plugin its abilities.
AI代理可以被视为插件以及插件、工具、技能和其他代理的消费者。向代理/插件添加技能、功能和工具,使其能够执行预定义的操作——图5.2突出显示了代理操作发生的位置及其与LLM和其他系统的交互。
AI agents can be considered plugins and consumers of plugins, tools, skills, and other agents. Adding skills, functions, and tools to an agent/plugin allows it to execute well-defined actions—figure 5.2 highlights where agent actions occur and their interaction with LLMs and other systems.
智能体动作是指智能体使用某种功能、技能或工具的能力。容易让人困惑的是,不同的框架使用不同的术语。为了建立一些基本定义,我们将动作定义为智能体可以执行的任何操作。
An agent action is an ability that allows it to use a function, skill, or tool. What gets confusing is that different frameworks use different terminology. We’ll define an action as anything an agent can do to establish some basic definitions.
ChatGPT 插件和函数代表了 ChatGPT 或代理系统可以用来执行额外操作的可操作能力。现在让我们来了解一下 OpenAI 插件的基础和函数定义。
ChatGPT plugins and functions represent an actionable ability that ChatGPT or an agent system can use to perform additional actions. Now let’s examine the basis for OpenAI plugins and the function definition.
OpenAI 通过启用插件,引入了一种结构规范,用于定义 LLM 可以执行的功能/插件之间的接口。该规范正逐渐成为 LLM 系统构建可执行系统的标准。
OpenAI, with the enablement of plugins, introduced a structure specification for defining the interface between functions/plugins an LLM could action. This specification is becoming a standard that LLM systems can follow to provide actionable systems.
这些函数定义现在也被用于定义 ChatGPT 和其他系统的插件。接下来,我们将探讨如何直接通过 LLM 调用来使用这些函数。
These same function definitions are now also being used to define plugins for ChatGPT and other systems. Next, we’ll explore how to use functions directly with an LLM call.
图 5.3 展示了 LLM 如何识别和使用函数定义,将其响应转换为函数调用。
Figure 5.3 demonstrates how an LLM recognizes and uses the function definition to cast its response as the function call.
清单 5.1 展示了使用工具和函数定义调用 LLM API 的详细信息。添加函数定义后,LLM 可以响应函数的输入参数。这意味着 LLM 将识别正确的函数并解析用户请求的相关参数。
Listing 5.1 shows the details of an LLM API call using tools and a function definition. Adding a function definition allows the LLM to reply regarding the function’s input parameters. This means the LLM will identify the correct function and parse the relevant parameters for the user’s request.
first_function.py(API 调用)response = client.chat.completions.create(
model="gpt-4-1106-preview",
messages=[{"role": "system",
"content": "你是一位乐于助人的助手。"
{"角色": "用户", "内容": 用户消息}],
温度=0.7,
工具=[ #1
{
"type": "function", #2
“功能”: {
"姓名": "推荐",
描述:请提供一个……主题。 #3
“参数”: {
"type": "object", #4
“特性”: {
“话题”: {
"type": "string",
“描述”:
“这个话题……是为了。” #5
},
评分:{
"type": "string",
“描述”:
“评级……已给出。”,#5
"枚举": ["好",
“坏的”,
“糟糕的”] #6
},
},
必填项:["主题"],
},
},
}
]
)response = client.chat.completions.create(
model="gpt-4-1106-preview",
messages=[{"role": "system",
"content": "You are a helpful assistant."},
{"role": "user", "content": user_message}],
temperature=0.7,
tools=[ #1
{
"type": "function", #2
"function": {
"name": "recommend",
"description": "Provide a … topic.", #3
"parameters": {
"type": "object", #4
"properties": {
"topic": {
"type": "string",
"description":
"The topic,… for.", #5
},
"rating": {
"type": "string",
"description":
"The rating … given.", #5
"enum": ["good",
"bad",
"terrible"] #6
},
},
"required": ["topic"],
},
},
}
]
)
要了解其工作原理,请打开 Visual Studio Code (VS Code) 并进入本书的源代码文件夹。chapter_4/first_function.py建议在 VS Code 中打开相关章节文件夹,创建一个新的 Python 环境并安装该requirements.txt文件。如果您需要帮助,请参阅附录 B。
To see how this works, open Visual Studio Code (VS Code) to the book’s source code folder: chapter_4/first_function.py. It’s a good practice to open the relevant chapter folder in VS Code to create a new Python environment and install the requirements.txt file. If you need assistance with this, consult appendix B.
.env开始之前,请确保在文件夹中正确设置chapter_4包含 API 凭据的文件。函数调用是 LLM 商业服务提供的一项额外功能。截至撰写本文时,开源 LLM 部署尚不支持此功能。
Before starting, correctly set up an .env file in the chapter_4 folder with your API credentials. Function calling is an extra capability provided by the LLM commercial service. At the time of writing, this feature wasn’t an option for open source LLM deployments.
接下来,我们将查看代码底部,first_function.py,如清单 5.2 所示。这里仅列举了使用清单 5.1 中先前指定的请求向 LLM 发出的两个调用示例。每个请求都显示了运行示例后生成的输出。
Next, we’ll look at the bottom of the code in first_function.py, as shown in listing 5.2. Here are just two examples of calls made to an LLM using the request previously specified in listing 5.1. Here, each request shows the generated output from running the example.
first_function.py(API 操作)用户:“请问您能推荐一部穿越时空的电影吗?”
response = ask_chatgpt(user) #1
print(响应)
###输出
函数(参数='{"topic":"时间旅行电影"}',
名称='推荐') #2
用户:“请问您能推荐一部好看的穿越时空电影吗?”
response = ask_chatgpt(user) #3
print(响应)
###输出
函数(参数='{"topic":"时间旅行电影",
"rating":"好"}',
名称='推荐') #4user = "Can you please recommend me a time travel movie?"
response = ask_chatgpt(user) #1
print(response)
###Output
Function(arguments='{"topic":"time travel movie"}',
name='recommend') #2
user = "Can you please recommend me a good time travel movie?"
response = ask_chatgpt(user) #3
print(response)
###Output
Function(arguments='{"topic":"time travel movie",
"rating":"good"}',
name='recommend') #4
在 VS Code 中使用调试器 (F5) 或终端运行first_function.pyPython 脚本,即可看到相同的结果。这里,LLM 会解析输入请求,以匹配任何已注册的工具。在本例中,该工具是单个函数定义,即推荐函数。LLM 从该函数中提取输入参数,并解析请求中的这些参数。然后,它会返回指定的函数及其指定的输入参数。
Run the first_function.py Python script in VS Code using the debugger (F5) or the terminal to see the same results. Here, the LLM parses the input request to match any registered tools. In this case, the tool is the single function definition, that is, the recommended function. The LLM extracts the input parameters from this function and parses those from the request. Then, it replies with the named function and designated input parameters.
注意: 实际的函数不会被调用。LLM 只返回建议的函数及其相关的输入参数。必须提取函数名和参数,并将其传递给与函数签名匹配的函数才能执行该操作。我们将在下一节中查看示例。
NOTE The actual function isn’t being called. The LLM only returns the suggested function and the relevant input parameters. The name and parameters must be extracted and passed into a function matching the signature to act on the function. We’ll look at an example of this in the next section.
既然我们已经了解了LLM不会直接执行函数或插件,那么我们可以来看一个执行工具的示例。为了继续围绕推荐系统展开,我们将看另一个添加Python函数以实现简单推荐的示例。
Now that we understand that an LLM doesn’t execute the function or plugin directly, we can look at an example that executes the tools. Keeping with the recommender theme, we’ll look at another example that adds a Python function for simple recommendations.
图 5.4 展示了这个简单示例的工作原理。我们将提交一个包含工具功能定义的请求,请求三个建议。LLM 会回复三个带有输入参数(时间旅行、食谱和礼物)的函数调用。函数执行结果随后会返回给 LLM,LLM 会将结果转换回自然语言并返回回复。
Figure 5.4 shows how this simple example will work. We’ll submit a single request that includes a tool function definition, asking for three recommendations. The LLM, in turn, will reply with the three function calls with input parameters (time travel, recipe, and gift). The results from executing the functions are then passed back to the LLM, which converts them back to natural language and returns a reply.
现在我们已经理解了这个示例,请parallel_functions.py在 VS Code 中打开它。清单 5.3 显示了您要调用以提供推荐的 Python 函数。
Now that we understand the example, open parallel_functions.py in VS Code. Listing 5.3 shows the Python function that you want to call to give recommendations.
parallel_functions.py(推荐功能)def recommend(topic, rating="good"):
如果 topic.lower() 中包含“时间旅行”: #1
返回 json.dumps({"topic": "时间旅行",
推荐影片:《回到未来》
"rating": rating})
elif "recipe" in topic.lower(): #1
return json.dumps({"topic": "recipe",
推荐语:“吃过的最好的东西……”
"rating": rating})
elif "gift" in topic.lower(): #1
返回 json.dumps({"topic": "gift",
“推荐”:“一部精彩的新……”
"rating": rating})
别的: #2
返回 json.dumps({"topic": topic,
“建议”: “未知”}) #3def recommend(topic, rating="good"):
if "time travel" in topic.lower(): #1
return json.dumps({"topic": "time travel",
"recommendation": "Back to the Future",
"rating": rating})
elif "recipe" in topic.lower(): #1
return json.dumps({"topic": "recipe",
"recommendation": "The best thing … ate.",
"rating": rating})
elif "gift" in topic.lower(): #1
return json.dumps({"topic": "gift",
"recommendation": "A glorious new...",
"rating": rating})
else: #2
return json.dumps({"topic": topic,
"recommendation": "unknown"}) #3
接下来,我们将研究名为 的函数run_conversation,其中所有工作都从请求构造开始。
Next, we’ll look at the function called run_conversation, where all the work starts with the request construction.
parallel_functions.py(run_conversation,request)用户 = """请问您能否就以下方面提出建议:"
1. 穿越时空电影
2. 食谱
3. 礼物” #1
messages = [{"role": "user", "content": user}] #2
工具 = [ #3
{
"type": "function",
“功能”: {
"姓名": "推荐",
“描述”:
请就任何主题提出建议。
“参数”: {
"type": "object",
“特性”: {
“话题”: {
"type": "string",
“描述”:
“关于这个话题,……建议是……”
},
评分:{
"type": "string",
描述:评级……已给出。
枚举:["好", "坏", "糟糕"]
},
},
必填项:["主题"],
},
},
}
]user = """Can you please make recommendations for the following:
1. Time travel movies
2. Recipes
3. Gifts""" #1
messages = [{"role": "user", "content": user}] #2
tools = [ #3
{
"type": "function",
"function": {
"name": "recommend",
"description":
"Provide a recommendation for any topic.",
"parameters": {
"type": "object",
"properties": {
"topic": {
"type": "string",
"description":
"The topic, … recommendation for.",
},
"rating": {
"type": "string",
"description": "The rating … was given.",
"enum": ["good", "bad", "terrible"]
},
},
"required": ["topic"],
},
},
}
]
清单 5.5 展示了请求的发出过程,我们之前已经介绍过,但有几点需要注意。这次调用使用了较低级别的模型,例如 GPT-3.5,因为委托函数是一项更直接的任务,可以使用较旧、成本较低、功能较少的语言模型来完成。
Listing 5.5 shows the request being made, which we’ve covered before, but there are a few things to note. This call uses a lower model such as GPT-3.5 because delegating functions is a more straightforward task and can be done using older, cheaper, less sophisticated language models.
parallel_functions.py(run_conversation,API 调用)response = client.chat.completions.create(
型号="gpt-3.5-turbo-1106", #1
messages=messages, #2
工具=工具,#2
tool_choice="auto", #3
)
response_message = response.choices[0].message #4response = client.chat.completions.create(
model="gpt-3.5-turbo-1106", #1
messages=messages, #2
tools=tools, #2
tool_choice="auto", #3
)
response_message = response.choices[0].message #4
此时,API 调用完成后,响应应该包含所需函数调用的信息。请记住,我们要求 LLM 提供三条建议,这意味着它也应该提供三个函数调用输出,如下面的列表所示。
At this point, after the API call, the response should hold the information for the required function calls. Remember, we asked the LLM to provide us with three recommendations, which means it should also provide us with three function call outputs, as shown in the following listing.
parallel_functions.py(run_conversation,tool_calls)tool_calls = response_message.tool_calls #1
如果 tool_calls:#1
可用函数 = {
“推荐”:推荐,
} #2
# 步骤 4:将每次函数调用和函数响应的信息发送到
模型
对于 tool_calls 中的每个 tool_call: #3
函数名 = tool_call.function.name
要调用的函数 = 可用函数[函数名]
function_args = json.loads(tool_call.function.arguments)
function_response = function_to_call(
topic=function_args.get("topic"), #4
rating=function_args.get("rating"),
)
messages.append( #5
{
"tool_call_id": tool_call.id,
"角色": "工具",
"name": function_name,
"content": function_response,
}
) # 使用函数响应扩展对话
second_response = client.chat.completions.create( #6
型号="gpt-3.5-turbo-1106",
messages=messages,
)
返回 second_response.choices[0].message.content #6tool_calls = response_message.tool_calls #1
if tool_calls: #1
available_functions = {
"recommend": recommend,
} #2
# Step 4: send the info for each function call and function response to
the model
for tool_call in tool_calls: #3
function_name = tool_call.function.name
function_to_call = available_functions[function_name]
function_args = json.loads(tool_call.function.arguments)
function_response = function_to_call(
topic=function_args.get("topic"), #4
rating=function_args.get("rating"),
)
messages.append( #5
{
"tool_call_id": tool_call.id,
"role": "tool",
"name": function_name,
"content": function_response,
}
) # extend conversation with function response
second_response = client.chat.completions.create( #6
model="gpt-3.5-turbo-1106",
messages=messages,
)
return second_response.choices[0].message.content #6
工具调用输出和推荐函数调用结果会被附加到消息中。请注意,消息现在还包含首次调用的历史记录。然后,这些信息会被传递回语言学习模型(LLM),以构建自然语言回复。
The tool call outputs and the calls to the recommender function results are appended to the messages. Notice how messages now also contain the history of the first call. This is then passed back to the LLM to construct a reply in natural language.
在 VS Code 中打开此文件并按 F5 键即可调试此示例。以下列表显示了运行后的输出parallel_functions.py。
Debug this example in VS Code by pressing the F5 key with the file open. The following listing shows the output of running parallel_functions.py.
parallel_functions.py(输出)以下是一些建议: 1. 穿越时空电影:《回到未来》 2. 食谱:“你吃过的最好吃的东西。” 3. 礼物:“一份崭新的……”(推荐语被截断了,所以我 (无法提供完整的建议) 希望这些建议对您有所帮助!如果您需要任何帮助,请告诉我。 更多信息。
Here are some recommendations for you: 1. Time travel movies: "Back to the Future" 2. Recipes: "The best thing you ever ate." 3. Gifts: "A glorious new..." (the recommendation was cut off, so I couldn't provide the full recommendation) I hope you find these recommendations helpful! Let me know if you need more information.
至此,这个简单的演示就完成了。对于更高级的应用,这些函数可以执行各种各样的操作,从抓取网站内容到调用搜索引擎,再到完成更复杂的任务。
This completes this simple demonstration. For more advanced applications, the functions could do any number of things, from scraping websites to calling search engines to completing far more complex tasks.
函数是为特定任务生成输出的绝佳方式。然而,处理函数或工具以及进行二次调用的工作可以采用更简洁、更高效的方式。下一节将介绍一种更强大的向代理添加动作的系统。
Functions are an excellent way to cast outputs for a particular task. However, the work of handling functions or tools and making secondary calls can be done in a cleaner and more efficient way. The following section will uncover a more robust system of adding actions to agents.
语义内核 (Semantic Kernel,简称 SK) 是微软的另一个开源项目,旨在帮助构建人工智能应用程序,我们称之为智能体。该项目的核心功能是定义动作,或者平台称之为语义插件,这些插件是对技能和功能的封装。
Semantic Kernel (SK) is another open source project from Microsoft intended to help build AI applications, which we call agents. At its core, the project is best used to define actions, or what the platform calls semantic plugins, which are wrappers for skills and functions.
图 5.5 展示了如何将 SK 用作 OpenAI 插件的插件和使用者。SK 依赖于 OpenAI 插件定义来定义插件。这样,它就可以使用自身或其他插件,并将其发布到其他系统。
Figure 5.5 shows how the SK can be used as a plugin and a consumer of OpenAI plugins. The SK relies on the OpenAI plugin definition to define a plugin. That way, it can consume and publish itself or other plugins to other systems.
OpenAI 插件定义与清单 5.4 中的函数定义完全对应。这意味着 SK 是 API 工具调用(即插件)的协调器。这也意味着 SK 可以帮助组织多个带有聊天界面或代理的插件。
An OpenAI plugin definition maps precisely to the function definitions in listing 5.4. This means that SK is the orchestrator of API tool calls, aka plugins. That also means that SK can help organize multiple plugins with a chat interface or an agent.
注意: SK 团队最初将功能模块称为“技能”(skills)。然而,为了与 OpenAI 保持一致,他们后来将“技能”更名为“插件”(plugins)。更令人困惑的是,代码中仍然使用“技能”一词。因此,在本章中,我们将“技能”和“插件”指代同一概念。
Note The team at SK originally labeled the functional modules as skills. However, to be more consistent with OpenAI, they have since renamed skills to plugins. What is more confusing is that the code still uses the term skills. Therefore, throughout this chapter, we’ll use skills and plugins to mean the same thing.
SK 是一个用于管理多个插件(代理操作)的实用工具,而且正如我们稍后将看到的,它还可以辅助内存管理和规划工具。本章我们将重点介绍操作/插件。下一节,我们将探讨如何开始使用 SK。
SK is a useful tool for managing multiple plugins (actions for agents) and, as we’ll see later, can also assist with memory and planning tools. For this chapter, we’ll focus on the actions/plugins. In the next section, we look at how to get started using SK.
SK 易于安装,并可在 Python、Java 和 C# 中运行。这无疑是个好消息,因为它允许用一种语言开发的插件在另一种语言中使用。但是,目前还无法用一种语言开发原生函数并在另一种语言中使用它。
SK is easy to install and works within Python, Java, and C#. This is excellent news as it also allows plugins developed in one language to be consumed in a different language. However, you can’t yet develop a native function in one language and use it in another.
我们将继续在 VS Code 的工作区中搭建 Python 环境chapter_4。如果您想浏览和运行任何示例,请确保已配置好工作区。
We’ll continue from where we left off for the Python environment using the chapter_4 workspace in VS Code. Be sure you have a workspace configured if you want to explore and run any examples.
清单 5.8 展示了如何在 VS Code 的终端中安装 SK。您也可以安装 SK 的 VS Code 扩展。该扩展可以作为创建插件/技能的实用工具,但并非必需。
Listing 5.8 shows how to install SK from a terminal within VS Code. You can also install the SK extension for VS Code. The extension can be a helpful tool to create plugins/skills, but it isn’t required.
pip uninstall semantic-kernel #1 git clone https://github.com/microsoft/semantic-kernel.git #2 cd semantic-kernel/python #3 pip install -e . #4
pip uninstall semantic-kernel #1 git clone https://github.com/microsoft/semantic-kernel.git #2 cd semantic-kernel/python #3 pip install -e . #4
安装完成后,SK_connecting.py在 VS Code 中打开。清单 5.9 展示了如何通过 SK 快速运行示例。该示例使用 OpenAI 或 Azure OpenAI 创建聊天自动完成服务。
Once you finish the installation, open SK_connecting.py in VS Code. Listing 5.9 shows a demo of running an example quickly through SK. The example creates a chat completion service using either OpenAI or Azure OpenAI.
SK_connecting.py导入 semantic_kernel as sk
selected_service = "OpenAI" #1
kernel = sk.Kernel() #2
service_id = None
如果 selected_service == "OpenAI":
from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion
api_key, org_id = sk.openai_settings_from_dot_env() #3
service_id = "oai_chat_gpt"
kernel.add_service(
OpenAIChatCompletion(
service_id=service_id,
ai_model_id =“gpt-3.5-turbo-1106”,
api_key=api_key,
org_id=org_id,
),
)
elif selected_service == "AzureOpenAI":
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion
部署、api_key、端点 =
↪ sk.azure_openai_settings_from_dot_env() #4
service_id = "aoai_chat_completion"
kernel.add_service(
AzureChatCompletion(
service_id=service_id,
deployment_name=部署,
endpoint=endpoint,
api_key=api_key,
),
)
#此功能目前已损坏
async def run_prompt():
result = await kernel.invoke_prompt(
↪ prompt="推荐一部关于时间旅行的电影
") #5
打印(结果)
# 使用 asyncio.run 执行异步函数
asyncio.run(run_prompt()) #6
###输出
强烈推荐一部穿越时空的电影:《回到未来》(1985)。
由罗伯特·泽米吉斯执导。这部经典影片讲述了……的冒险故事。
少年马蒂·麦克弗莱(迈克尔·J·福克斯饰)……import semantic_kernel as sk
selected_service = "OpenAI" #1
kernel = sk.Kernel() #2
service_id = None
if selected_service == "OpenAI":
from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion
api_key, org_id = sk.openai_settings_from_dot_env() #3
service_id = "oai_chat_gpt"
kernel.add_service(
OpenAIChatCompletion(
service_id=service_id,
ai_model_id="gpt-3.5-turbo-1106",
api_key=api_key,
org_id=org_id,
),
)
elif selected_service == "AzureOpenAI":
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion
deployment, api_key, endpoint =
↪ sk.azure_openai_settings_from_dot_env() #4
service_id = "aoai_chat_completion"
kernel.add_service(
AzureChatCompletion(
service_id=service_id,
deployment_name=deployment,
endpoint=endpoint,
api_key=api_key,
),
)
#This function is currently broken
async def run_prompt():
result = await kernel.invoke_prompt(
↪ prompt="recommend a movie about
↪ time travel") #5
print(result)
# Use asyncio.run to execute the async function
asyncio.run(run_prompt()) #6
###Output
One highly recommended time travel movie is "Back to the Future" (1985)
directed by Robert Zemeckis. This classic film follows the adventures of
teenager Marty McFly (Michael J. Fox)…
按 F5(调试)运行示例,您应该会看到类似于清单 5.9 的输出。此示例演示了如何使用 SK 创建并执行语义函数。语义函数相当于提示流(另一个 Microsoft 工具)中的提示模板。在本示例中,我们将一个简单的提示定义为一个函数。
Run the example by pressing F5 (debugging), and you should see an output similar to listing 5.9. This example demonstrates how a semantic function can be created with SK and executed. A semantic function is the equivalent of a prompt template in prompt flow, another Microsoft tool. In this example, we define a simple prompt as a function.
需要注意的是,这个语义函数并没有被定义为插件。但是,内核可以将其创建为一个独立的语义元素,并针对语言逻辑模型(LLM)执行。语义函数可以单独使用,也可以注册为插件,这一点稍后会详细介绍。接下来,我们进入下一节,介绍上下文变量。
It’s important to note that this semantic function isn’t defined as a plugin. However, the kernel can create the function as a self-contained semantic element that can be executed against an LLM. Semantic functions can be used alone or registered as plugins, as you’ll see later. Let’s jump to the next section, where we introduce contextual variables.
在前一个例子的基础上,我们可以探讨如何向语义函数添加上下文变量。这种向提示模板添加占位符的模式,我们会反复使用。在这个例子中,我们来看一个包含主题、类型、格式和自定义占位符的提示模板。
Expanding on the previous example, we can look at adding contextual variables to the semantic function. This pattern of adding placeholders to prompt templates is one we’ll review over and over. In this example, we look at a prompt template that has placeholders for subject, genre, format, and custom.
在 VS Code 中打开SK_context_variables.py,如下一个示例所示。该提示相当于在提示符中留出system一部分空间。user
Open SK_context_variables.py in VS Code, as shown in the next listing. The prompt is equivalent to setting aside a system and user section of the prompt.
SK_context_variables.py顶部部分已省略……
提示 = """ #1
系统:
你知识渊博,对任何事物都能提出建议。
您将获得以下标准:主题、类型、格式以及任何其他要求。
其他自定义信息。
用户:
请推荐一个主题为 {{$subject}}、类型为 {{$genre}} 的 {{$format}} 文件。
请添加以下自定义信息:{{$custom}}
"""
prompt_template_config = sk.PromptTemplateConfig( #2
template=提示,
名称="tldr",
template_format="语义内核",
输入变量=[
输入变量(
名称="格式",
描述“推荐格式”
is_required=True
),
输入变量(
姓名="主题",
描述“推荐主题”
is_required=True
),
输入变量(
名称="类型",
描述“值得推荐的类型”
is_required=True
),
输入变量(
名称="custom",
描述="任何自定义信息 [CA]
为了加强推荐意见”,
is_required=True,
),
],
执行设置=执行设置,
)
推荐函数 = kernel.create_function_from_prompt( #3
prompt_template_config=prompt_template_config,
function_name="推荐电影",
plugin_name="推荐",
)
async def run_recommendation( #4
主题“时间旅行”,
format="movie",
类型="中世纪",
自定义“必须是喜剧”
):
建议 = await kernel.invoke(
推荐函数
sk.KernelArguments(subject=subject,
格式=格式,
类型=类型,
custom=custom), #5
)
打印(建议)
# 使用 asyncio.run 执行异步函数
asyncio.run(run_recommendation()) #5
###输出
一部符合以下条件的电影:关于时间旅行,背景设定在……
这部名为《访客》(Les Visiteurs)的喜剧,故事背景设定在中世纪。
这部1993年的法国电影由让-马里·波雷执导,讲述了……
一位骑士和他的侍从被传送到了现代……
巫师的咒语出错了……#top section omitted…
prompt = """ #1
system:
You have vast knowledge of everything and can recommend anything provided
you are given the following criteria, the subject, genre, format and any
other custom information.
user:
Please recommend a {{$format}} with the subject {{$subject}} and {{$genre}}.
Include the following custom information: {{$custom}}
"""
prompt_template_config = sk.PromptTemplateConfig( #2
template=prompt,
name="tldr",
template_format="semantic-kernel",
input_variables=[
InputVariable(
name="format",
description="The format to recommend",
is_required=True
),
InputVariable(
name="suject",
description="The subject to recommend",
is_required=True
),
InputVariable(
name="genre",
description="The genre to recommend",
is_required=True
),
InputVariable(
name="custom",
description="Any custom information [CA]
to enhance the recommendation",
is_required=True,
),
],
execution_settings=execution_settings,
)
recommend_function = kernel.create_function_from_prompt( #3
prompt_template_config=prompt_template_config,
function_name="Recommend_Movies",
plugin_name="Recommendation",
)
async def run_recommendation( #4
subject="time travel",
format="movie",
genre="medieval",
custom="must be a comedy"
):
recommendation = await kernel.invoke(
recommend_function,
sk.KernelArguments(subject=subject,
format=format,
genre=genre,
custom=custom), #5
)
print(recommendation)
# Use asyncio.run to execute the async function
asyncio.run(run_recommendation()) #5
###Output
One movie that fits the criteria of being about time travel, set in a
medieval period, and being a comedy is "The Visitors" (Les Visiteurs)
from 1993. This French film, directed by Jean-Marie Poiré, follows a
knight and his squire who are transported to the modern era by a
wizard’s spell gone wrong.…
请继续调试此示例(按 F5),并等待输出结果生成。这是设置 SK 以及创建和使用语义功能的基础。下一节,我们将继续了解如何将语义功能注册为技能/插件。
Go ahead and debug this example (F5), and wait for the output to be generated. That is the basis for setting up SK and creating and exercising semantic functions. In the next section, we move on to see how a semantic function can be registered as a skill/plugin.
语义函数封装了提示/配置文件,并通过与 LLM 的交互来执行。原生函数封装了代码,可以执行从网页抓取到网络搜索等各种操作。语义函数和原生函数都可以注册为 SK 内核中的插件/技能。
Semantic functions encapsulate a prompt/profile and execute through interaction with an LLM. Native functions are the encapsulation of code that may perform anything from scraping websites to searching the web. Both semantic and native functions can register as plugins/skills in the SK kernel.
语义函数或原生函数都可以注册为插件,其使用方式与之前直接通过 API 调用注册函数的方式相同。函数注册为插件后,即可供聊天或代理界面访问,具体取决于使用场景。下一节将介绍如何创建语义函数并将其注册到内核中。
A function, semantic or native, can be registered as a plugin and used the same way we registered the earlier function directly with our API calls. When a function is registered as a plugin, it becomes accessible to chat or agent interfaces, depending on the use case. The next section looks at how a semantic function is created and registered with the kernel.
适用于 SK 的 VS Code 扩展提供了创建插件/技能的实用工具。在本节中,我们将使用 SK 扩展创建一个插件/技能,然后编辑该扩展的组件。之后,我们将在 SK 中注册并运行该插件。
The VS Code extension for SK provides helpful tools for creating plugins/skills. In this section, we’ll use the SK extension to create a plugin/skill and then edit the components of that extension. After that, we’ll register and execute the plugin in the SK.
图 5.6 展示了使用 SK 扩展在 VS Code 中创建新技能的过程。(如果需要安装此扩展,请参阅附录 B 获取安装说明。)接下来,您可以选择技能/插件文件夹来放置函数。务必将相似的函数分组在一起。创建技能后,输入您要开发的函数的名称和描述。请务必按照 LLM(法学硕士)使用该函数的方式来描述它。
Figure 5.6 shows the process for creating a new skill within VS Code using the SK extension. (Refer to appendix B for directions if you need to install this extension.) You’ll then be given the option for the skill/plugin folder to place the function. Always group functions that are similar together. After creating a skill, enter the name and description of the function you want to develop. Be sure to describe the function as if the LLM were going to use it.
skills/plugin打开文件夹并查看文件,即可看到已完成的技能和功能。我们将沿用之前构建的示例,因此请打开skills/Recommender/Recommend_Movies文件夹,如图 5.7 所示。该文件夹内包含一个config.json文件、功能描述以及一个名为 . 的语义功能/提示文件skprompt.txt。
You can see the completed skills and functions by opening the skills/plugin folder and reviewing the files. We’ll follow the previously constructed example, so open the skills/Recommender/Recommend_Movies folder, as shown in figure 5.7. Inside this folder is a config.json file, the function description, and the semantic function/prompt in a file called skprompt.txt.
清单 5.11 展示了语义函数定义(也称为插件定义)的内容。请注意,由于这是一个语义函数,其类型被标记为 ` completion<type>` 而不是 `<type>` function。原生函数则被定义为 `<type>` 函数。
Listing 5.11 shows the contents of the semantic function definition, also known as the plugin definition. Note that the type is marked as completion and not of type function because this is a semantic function. We would define a native function as a type function.
Recommend_Movies/config.json{
“方案”:1,
"type": "完成", #1
描述:一个根据用户列表推荐电影的功能
以前看过的电影。
"完成": { #2
"max_tokens": 256,
“温度”:0,
"top_p": 0,
"presence_penalty": 0,
"frequency_penalty": 0
},
“输入”: {
“参数”: [
{
"名称": "输入", #3
描述:用户之前观看过的电影列表。
"defaultValue": ""
}
]
},
"default_backends": []
}{
"schema": 1,
"type": "completion", #1
"description": "A function to recommend movies based on users list of
previously seen movies.",
"completion": { #2
"max_tokens": 256,
"temperature": 0,
"top_p": 0,
"presence_penalty": 0,
"frequency_penalty": 0
},
"input": {
"parameters": [
{
"name": "input", #3
"description": "The users list of previously seen movies.",
"defaultValue": ""
}
]
},
"default_backends": []
}
接下来,我们可以看一下语义功能提示的定义,如清单 5.12 所示。格式略有不同,但与之前使用模板的示例一致。此提示会根据用户之前观看过的电影列表推荐电影。
Next, we can look at the definition of the semantic function prompt, as shown in listing 5.12. The format is a little different, but what we see here matches the earlier examples using templating. This prompt recommends movies based on a list of movies the user has previously seen.
Recommend_Movies/skprompt.txt你是一位眼光独到的电影推荐人,有人请你推荐一部电影。
将电影发送给用户。
系统会提供用户之前观看过的电影列表。
你想推荐一部用户之前没看过的电影。
[输入]
{{$input}}
[输入结束]You are a wise movie recommender and you have been asked to recommend a
movie to a user.
You are provided a list of movies that the user has watched before.
You want to recommend a movie that the user has not watched before.
[INPUT]
{{$input}}
[END INPUT]
现在,我们将深入探讨加载技能/插件并通过一个简单的示例来执行它的代码。请SK_first_skill.py在 VS Code 中打开该文件。以下代码清单显示了一个简化版本,其中突出显示了新增部分。
Now, we’ll dive into the code that loads the skill/plugin and executes it in a simple example. Open the SK_first_skill.py file in VS Code. The following listing shows an abridged version highlighting the new sections.
kernel = sk.Kernel()
plugins_directory = "plugins"
推荐器 = kernel.import_plugin_from_prompt_directory(
plugins_directory,
“推荐人”,
) #1
推荐 = 推荐器["推荐电影"]
seen_movie_list = [ #2
《回到未来》
《终结者》
《十二猴子》
“环形使者”
《土拨鼠日》
“引言”,
《唐尼·达科》
“星际穿越”,
《时光大盗》
《奇异博士》
]
async def run():
result = await kernel.invoke(
推荐,
sk.KernelArguments( #3
settings=execution_settings, input=", ".join(seen_movie_list)
),
)
打印(结果)
asyncio.run(run()) #4
###输出
根据你提供的电影列表,你似乎有
对科幻小说、时间旅行和烧脑故事情节感兴趣。
鉴于你已经观看过经典电影和现代电影的混合体
如果你喜欢这种类型的电影,我推荐你看这部还没看过的电影。
前:
《明日边缘》(又名《生死轮回:明日边缘》)……kernel = sk.Kernel()
plugins_directory = "plugins"
recommender = kernel.import_plugin_from_prompt_directory(
plugins_directory,
"Recommender",
) #1
recommend = recommender["Recommend_Movies"]
seen_movie_list = [ #2
"Back to the Future",
"The Terminator",
"12 Monkeys",
"Looper",
"Groundhog Day",
"Primer",
"Donnie Darko",
"Interstellar",
"Time Bandits",
"Doctor Strange",
]
async def run():
result = await kernel.invoke(
recommend,
sk.KernelArguments( #3
settings=execution_settings, input=", ".join(seen_movie_list)
),
)
print(result)
asyncio.run(run()) #4
###Output
Based on the list of movies you've provided, it seems you have an
interest in science fiction, time travel, and mind-bending narratives.
Given that you've watched a mix of classics and modern films in this
genre, I would recommend the following movie that you have not watched
before:
"Edge of Tomorrow" (also known as "Live Die Repeat: Edge of Tomorrow")…
代码会从skills目录和plugin文件夹加载技能/插件。当技能被加载到内核中(而不仅仅是创建)时,它就成为了一个已注册的插件。这意味着它可以像这里一样直接执行,也可以通过插件接口在LLM聊天对话中执行。
The code loads the skill/plugin from the skills directory and the plugin folder. When a skill is loaded into the kernel and not just created, it becomes a registered plugin. That means it can be executed directly as is done here or through an LLM chat conversation via the plugin interface.
运行代码(F5),您应该会看到类似清单 5.13 的输出。现在我们有了一个简单的语义函数,可以作为插件部署。但是,这个函数需要用户输入他们看过的所有电影的完整列表。我们将在下一节中探讨如何通过引入原生函数来解决这个问题。
Run the code (F5), and you should see an output like listing 5.13. We now have a simple semantic function that can be hosted as a plugin. However, this function requires users to input a complete list of movies they have watched. We’ll look at a means to fix this by introducing native functions in the next section.
如前所述,原生函数是能够执行任何操作的代码。在下面的示例中,我们将引入一个原生函数来辅助我们之前构建的语义函数。
As stated, native functions are code that can do anything. In the following example, we’ll introduce a native function to assist the semantic function we built earlier.
这个原生函数会从文件中加载用户之前观看过的电影列表。虽然这个函数引入了内存的概念,但我们将在第八章讨论这个问题。你可以把这个新的原生函数看作是几乎可以执行任何操作的代码。
This native function will load a list of movies the user has previously seen, from a file. While this function introduces the concept of memory, we’ll defer that discussion until chapter 8. Consider this new native function as any code that could virtually do anything.
可以使用 SK 扩展创建和注册原生函数。在本示例中,我们将直接在代码中创建一个原生函数,以便于理解。
Native functions can be created and registered using the SK extension. For this example, we’ll create a native function directly in code to make the example easier to follow.
在 VS Code 中打开SK_native_functions.py。我们首先来看一下原生函数是如何定义的。原生函数通常在类中定义,这样可以简化原生函数的管理和实例化。
Open SK_native_functions.py in VS Code. We’ll start by looking at how the native function is defined. A native function is typically defined within a class, which simplifies managing and instantiating native functions.
SK_native_functions.py ( MySeenMovieDatabase)类 MySeenMoviesDatabase:
"""
描述:管理用户已观看电影的列表。 #1
"""
@kernel_function( #2
description="加载用户已观看过的电影列表"
name="LoadSeenMovies",
)
def load_seen_movies(self) -> str: #3
尝试:
with open("seen_movies.txt", 'r') as file: #4
lines = [line.strip() for line in file.readlines()]
逗号分隔字符串 = ', '.join(lines)
返回逗号分隔字符串
除异常 e 外:
print(f"读取文件时出错:{e}")
返回 Noneclass MySeenMoviesDatabase:
"""
Description: Manages the list of users seen movies. #1
"""
@kernel_function( #2
description="Loads a list of movies … user has already seen",
name="LoadSeenMovies",
)
def load_seen_movies(self) -> str: #3
try:
with open("seen_movies.txt", 'r') as file: #4
lines = [line.strip() for line in file.readlines()]
comma_separated_string = ', '.join(lines)
return comma_separated_string
except Exception as e:
print(f"Error reading file: {e}")
return None
定义好原生函数后,我们可以通过向下滚动文件来查看它的使用方式,如下面的列表所示。
With the native function defined, we can see how it’s used by scrolling down in the file, as shown in the following listing.
SK_native_functions(剩余代码)plugins_directory = "plugins"
推荐器 = kernel.import_plugin_from_prompt_directory(
plugins_directory,
“推荐人”,
) #1
推荐 = 推荐器["推荐电影"]
seen_movies_plugin = kernel.import_plugin_from_object(
MySeenMoviesDatabase(), "SeenMoviesPlugin"
) #2
load_seen_movies = seen_movies_plugin["LoadSeenMovies"] #3
async def show_seen_movies():
seen_movie_list = await load_seen_movies(kernel)
返回已看电影列表
seen_movie_list = asyncio.run(show_seen_movies()) #4
print(seen_movie_list)
async def run(): #5
result = await kernel.invoke(
推荐,
sk.KernelArguments(
settings=execution_settings,
输入=已看电影列表),
)
打印(结果)
asyncio.run(run()) #5
###输出
《黑客帝国》、《黑客帝国2:重装上阵》、《黑客帝国3:矩阵革命》、《黑客帝国》
复活——output from print statement
根据你对《黑客帝国》系列的兴趣来看,你似乎很喜欢……
具有强烈哲学内涵和动作元素的科幻电影
元素。鉴于你已经观看了所有plugins_directory = "plugins"
recommender = kernel.import_plugin_from_prompt_directory(
plugins_directory,
"Recommender",
) #1
recommend = recommender["Recommend_Movies"]
seen_movies_plugin = kernel.import_plugin_from_object(
MySeenMoviesDatabase(), "SeenMoviesPlugin"
) #2
load_seen_movies = seen_movies_plugin["LoadSeenMovies"] #3
async def show_seen_movies():
seen_movie_list = await load_seen_movies(kernel)
return seen_movie_list
seen_movie_list = asyncio.run(show_seen_movies()) #4
print(seen_movie_list)
async def run(): #5
result = await kernel.invoke(
recommend,
sk.KernelArguments(
settings=execution_settings,
input=seen_movie_list),
)
print(result)
asyncio.run(run()) #5
###Output
The Matrix, The Matrix Reloaded, The Matrix Revolutions, The Matrix
Resurrections – output from print statement
Based on your interest in the "The Matrix" series, it seems you enjoy
science fiction films with a strong philosophical undertone and action
elements. Given that you've watched all
需要注意的一个重要方面是原生函数是如何导入到内核中的。导入到内核的操作会将该函数注册为插件/技能。这意味着可以通过内核中的技能,在其他对话或交互中使用该函数。我们将在下一节中了解如何将原生函数嵌入到语义函数中。
One important aspect to note is how the native function was imported into the kernel. The act of importing to the kernel registers that function as a plugin/skill. This means the function can be used as a skill from the kernel through other conversations or interactions. We’ll see how to embed a native function within a semantic function in the next section.
SK 拥有众多强大的功能,其中一项实用功能是能够将原生功能或语义功能嵌入到其他语义功能中。以下示例展示了如何将原生功能嵌入到语义功能中。
There are plenty of powerful features within SK, but one beneficial feature is the ability to embed native or semantic functions within other semantic functions. The following listing shows how a native function can be embedded within a semantic function.
SK_semantic_native_functions.py ( skprompt)sk_prompt = """
你是一位眼光独到的电影推荐人,有人请你推荐一部电影。
将电影发送给用户。
您有一份用户之前观看过的电影列表。
你想推荐一部电影
该用户之前没有观看过。 #1
电影列表:{{MySeenMoviesDatabase.LoadSeenMovies}}。 #2
"""sk_prompt = """
You are a wise movie recommender and you have been asked to recommend a
movie to a user.
You have a list of movies that the user has watched before.
You want to recommend a movie that
the user has not watched before. #1
Movie List: {{MySeenMoviesDatabase.LoadSeenMovies}}. #2
"""
下一个示例SK_semantic_native_functions.py使用了内联原生函数和语义函数。在 VS Code 中打开该文件,以下代码清单展示了创建、注册和执行这些函数的代码。
The next example, SK_semantic_native_functions.py, uses inline native and semantic functions. Open the file in VS Code, and the following listing shows the code to create, register, and execute the functions.
SK_semantic_native_functions.py(节选)prompt_template_config = sk.PromptTemplateConfig(
template=sk_prompt,
名称="tldr",
template_format="语义内核",
执行设置=执行设置,
) #1
推荐函数 = kernel.create_function_from_prompt(
prompt_template_config=prompt_template_config,
function_name="推荐电影",
plugin_name="推荐",
) #2
异步定义 run_recommendation(): #3
建议 = await kernel.invoke(
推荐函数
sk.KernelArguments(),
)
打印(建议)
# 使用 asyncio.run 执行异步函数
asyncio.run(run_recommendation())
###输出
根据提供的列表来看,这位用户似乎是《黑客帝国》的粉丝。
系列电影。因为他们已经看过全部四部《黑客帝国》电影,所以我
建议……prompt_template_config = sk.PromptTemplateConfig(
template=sk_prompt,
name="tldr",
template_format="semantic-kernel",
execution_settings=execution_settings,
) #1
recommend_function = kernel.create_function_from_prompt(
prompt_template_config=prompt_template_config,
function_name="Recommend_Movies",
plugin_name="Recommendation",
) #2
async def run_recommendation(): #3
recommendation = await kernel.invoke(
recommend_function,
sk.KernelArguments(),
)
print(recommendation)
# Use asyncio.run to execute the async function
asyncio.run(run_recommendation())
###Output
Based on the list provided, it seems the user is a fan of the Matrix
franchise. Since they have watched all four existing Matrix movies, I
would recommend a…
运行代码后,你应该会看到类似清单 5.17 的输出。需要注意的一点是,本地函数已在内核中注册,但语义函数尚未注册。这一点很重要,因为函数创建并不会注册函数。
Run the code, and you should see an output like listing 5.17. One important aspect to note is that the native function is registered with the kernel, but the semantic function is not. This is important because function creation doesn’t register a function.
为了使此示例正常运行,必须将原生函数注册到内核中,内核会调用该import_plugin函数——即清单 5.17 中的第一行。然而,语义函数本身并未注册。注册该函数的一个简单方法是将其制作成插件并导入。
For this example to work correctly, the native function must be registered with the kernel, which uses the import_plugin function call—the first line in listing 5.17. However, the semantic function itself isn’t registered. An easy way to register the function is to make it a plugin and import it.
这些简单的练习展示了如何将插件和技能集成到聊天或代理界面中。下一节,我们将通过一个完整的示例来演示如何将代表服务或 GPT 界面的插件添加到聊天功能中。
These simple exercises showcase ways to integrate plugins and skills into chat or agent interfaces. In the next section, we’ll look at a complete example demonstrating adding a plugin representing a service or GPT interface to a chat function.
在第一章中,我们介绍了GPT接口的概念——这是一种通过插件和语义层将服务和其他组件连接到LLM的新范式。SK提供了一个优秀的抽象层,可以将任何服务转换为GPT接口。
In chapter 1, we introduced the concept of the GPT interface—a new paradigm in connecting services and other components to LLMs via plugins and semantic layers. SK provides an excellent abstraction for converting any service to a GPT interface.
图 5.8 展示了一个围绕名为“电影数据库”(TMDB; www.themoviedb.org )的 API 服务构建的 GPT 接口。TMDB 网站提供了一个免费的 API,用于公开电影和电视节目的信息。
Figure 5.8 shows a GPT interface constructed around an API service called The Movie Database (TMDB; www.themoviedb.org). The TMDB site provides a free API that exposes information about movies and TV shows.
要完成本节的练习,您必须在 TMDB 注册一个免费帐户并创建一个 API 密钥。获取 API 密钥的说明可以在 TMDB 网站 ( www.themoviedb.org ) 上找到,或者咨询 GPT-4 turbo 或更新的 LLM。
To follow along with the exercises in this section, you must register for a free account from TMDB and create an API key. Instructions for getting an API key can be found at the TMDB website (www.themoviedb.org) or by asking a GPT-4 turbo or a more recent LLM.
在接下来的几个小节中,我们将使用 SK 的一组原生函数创建一个 GPT 接口。然后,我们将使用 SK 内核测试该接口,并在本章后面将其作为插件集成到聊天功能中。下一节,我们将探讨如何针对 TMDB API 构建 GPT 接口。
Over the next set of subsections, we’ll create a GPT interface using an SK set of native functions. Then, we’ll use the SK kernel to test the interface and, later in this chapter, implement it as plugins into a chat function. In the next section, we look at building a GPT interface against the TMDB API.
TMDB 是一项优秀的服务,但它不提供语义服务,也不提供可以集成到 ChatGPT 或代理中的服务。为此,我们必须将 TMDB 公开的 API 调用封装在一个语义服务层中。
TMDB is an excellent service, but it provides no semantic services or services that can be plugged into ChatGPT or an agent. To do that, we must wrap the API calls that TMDB exposes in a semantic service layer.
语义服务层是一个 GPT 接口,它通过自然语言公开各种功能。如前所述,要将功能公开给 ChatGPT 或其他接口(例如代理),必须将其定义为插件。幸运的是,只要我们正确编写语义服务层,SK 就可以自动为我们创建这些插件。
A semantic service layer is a GPT interface that exposes functions through natural language. As discussed, to expose functions to ChatGPT or other interfaces such as agents, they must be defined as plugins. Fortunately, SK can create the plugins for us automatically, given that we write our semantic service layer correctly.
原生插件或技能集可以作为语义层。要创建原生插件,请创建一个新的插件文件夹,并将包含一组原生函数的类所在的 Python 文件放入该文件夹内。SK 扩展目前在这方面做得不够好,因此手动创建模块是最佳选择。
A native plugin or set of skills can act as a semantic layer. To create a native plugin, create a new plugin folder, and put a Python file holding a class containing the set of native functions inside that folder. The SK extension currently doesn’t do this well, so manually creating the module works best.
Movies图 5.9 显示了名为 的新插件和名为 的语义服务层的结构。对于原生函数,导入时使用tmdb.py父文件夹的名称( )。Movies
Figure 5.9 shows the structure of the new plugin called Movies and the semantic service layer called tmdb.py. For native functions, the parent folder’s name (Movies) is used in the import.
在 VS Code 中打开该tmdb.py文件,并查看文件顶部,如清单 5.18 所示。该文件包含一个名为 `<class_name>` 的类TMDbService,该类公开了多个映射到 API 端点调用的函数。其目的是将各种相关的 API 函数调用映射到此语义服务层。这将使这些函数作为插件公开,用于聊天或代理界面。
Open the tmdb.py file in VS Code, and look at the top of the file, as shown in listing 5.18. This file contains a class called TMDbService, which exposes several functions that map to API endpoint calls. The idea is to map the various relevant API function calls in this semantic service layer. This will expose the functions as plugins for a chat or agent interface.
tmdb.py(文件顶部)from semantic_kernel.functions import kernel_funct
导入请求
导入检查
def print_function_call(): #1
#已省略…
类 TMDbService: #2
def __init__(self):
在此处输入您的 TMDb API 密钥
self.api_key = "你的 TMDb API 密钥"
@kernel_function( #2
description="获取给定类型名称的电影类型 ID",
name="get_movie_genre_id",
input_description="要获取的 genre_id 对应的电影类型名称",
)
def get_movie_genre_id(self, genre_name: str) -> str: #3
print_function_call()
base_url = "https://api.themoviedb.org/3"
endpoint = f"{base_url}/genre/movie/list ↪
↪ ?api_key={self.api_key}&language=en-US"
response = requests.get(endpoint) #4
如果 response.status_code == 200:#4
genres = response.json()['genres']
对于 genres 中的 genre:
如果 genre_name.lower() 在 genre['name'].lower() 中:
返回 str(genre['id']) #5
返回 Nonefrom semantic_kernel.functions import kernel_funct
import requests
import inspect
def print_function_call(): #1
#omitted …
class TMDbService: #2
def __init__(self):
# enter your TMDb API key here
self.api_key = "your-TMDb-api-key"
@kernel_function( #2
description="Gets the movie genre ID for a given genre name",
name="get_movie_genre_id",
input_description="The movie genre name of the genre_id to get",
)
def get_movie_genre_id(self, genre_name: str) -> str: #3
print_function_call()
base_url = "https://api.themoviedb.org/3"
endpoint = f"{base_url}/genre/movie/list↪
↪ ?api_key={self.api_key}&language=en-US"
response = requests.get(endpoint) #4
if response.status_code == 200: #4
genres = response.json()['genres']
for genre in genres:
if genre_name.lower() in genre['name'].lower():
return str(genre['id']) #5
return None
大部分用于TMDbService调用 TMDB 端点的函数代码都是借助 GPT-4 Turbo 编写的。然后,每个函数都用sk_function装饰器进行了封装,以便语义化地暴露出来。
The bulk of the code for the TMDbService and the functions to call the TMDB endpoints was written with the help of GPT-4 Turbo. Then, each function was wrapped with the sk_function decorator to expose it semantically.
TMDB API 的部分调用已进行语义映射。清单 5.19 展示了另一个暴露给语义服务层的函数示例。该函数会获取特定类型电影的当前上映排名前十的列表。
A few of the TMDB API calls have been mapped semantically. Listing 5.19 shows another example of a function exposed to the semantic service layer. This function pulls a current top 10 list of movies playing for a particular genre.
tmdb.py ( get_top_movies_by_genre)@kernel_function( #1
描述="””
获取指定类型电影的当前播放列表”。
name="get_top_movies_by_genre",
input_description="要获取的电影类型"
)
def get_top_movies_by_genre(self, genre: str) -> str:
print_function_call()
genre_id = self.get_movie_genre_id(genre) #2
如果 genre_id:
base_url = "https://api.themoviedb.org/3
playing_movies_endpoint = f"{base_url}/movie/now_playing? ↪
↪ api_key={self.api_key}&language=en-US"
响应 = requests.get(
playing_movies_endpoint) #3
如果 response.status_code != 200:
返回 ””
playing_movies = response.json()['results'
for movie in playing_movies: #4
movie['genre_ids'] = [str(genre_id)
↪ for genre_id in movie['genre_ids']]
filtered_movies = [movie for movie ↪
↪ in playing_movies if genre_id ↪
↪ in movie['genre_ids']][:10] #5
results = ", ".join([movie['title'] for movie in filtered_movies])
返回结果
别的:
返回 ””@kernel_function( #1
description="””
Gets a list of currently playing movies for a given genre””",
name="get_top_movies_by_genre",
input_description="The genre of the movies to get",
)
def get_top_movies_by_genre(self, genre: str) -> str:
print_function_call()
genre_id = self.get_movie_genre_id(genre) #2
if genre_id:
base_url = "https://api.themoviedb.org/3
playing_movies_endpoint = f"{base_url}/movie/now_playing?↪
↪ api_key={self.api_key}&language=en-US"
response = requests.get(
playing_movies_endpoint) #3
if response.status_code != 200:
return ""
playing_movies = response.json()['results'
for movie in playing_movies: #4
movie['genre_ids'] = [str(genre_id)
↪ for genre_id in movie['genre_ids']]
filtered_movies = [movie for movie ↪
↪ in playing_movies if genre_id ↪
↪ in movie['genre_ids']][:10] #5
results = ", ".join([movie['title'] for movie in filtered_movies])
return results
else:
return ""
查看其他已进行语义映射的 API 调用。正如您所见,将 API 调用转换为语义服务存在一个明确的模式。在运行完整服务之前,我们将在下一节中测试每个函数。
Look through the various other API calls mapped semantically. As you can see, there is a well-defined pattern for converting API calls to a semantic service. Before we run the full service, we’ll test each of the functions in the next section.
在实际应用中,你可能需要为每个语义服务函数编写一套完整的单元测试或集成测试。这里我们不会这样做;相反,我们会编写一个简单的辅助脚本来测试各种函数。
In a real-world application, you’ll likely want to write a complete set of unit or integration tests for each semantic service function. We won’t do that here; instead, we’ll write a quick helper script to test the various functions.
在 VS Code 中打开test_tmdb_service.py代码,并查看代码,如清单 5.20 所示。您可以注释或取消注释任何函数以单独测试它们。请确保一次只取消注释一个函数。
Open test_tmdb_service.py in VS Code, and review the code, as shown in listing 5.20. You can comment and uncomment any functions to test them in isolation. Be sure to have only one function uncommented at a time.
test_tmdb_service.py导入 semantic_kernel as sk
from plugins.Movies.tmdb import TMDbService
async def main():
kernel = sk.Kernel() #1
tmdb_service = kernel.import_plugin_from_object ↪
↪ (TMDbService(), "TMDBService") #2
打印(
await tmdb_service["get_movie_genre_id"](
内核,sk.KernelArguments(
genre_name="action") #3
)
) #4
打印(
await tmdb_service["get_tv_show_genre_id"](
内核,sk.KernelArguments(
genre_name="action") #5
)
) #6
打印(
await tmdb_service["get_top_movies_by_genre"](
内核,sk.KernelArguments(
genre_name="action") #7
)
) #8
打印(
await tmdb_service["get_top_tv_shows_by_genre"](
内核,sk.KernelArguments(
genre_name="action") #7
)
)
print(await tmdb_service["get_movie_genres"](
内核,sk.KernelArguments())) #9
print(await tmdb_service["get_tv_show_genres"](
内核,sk.KernelArguments())) #9
# 运行主函数
如果 __name__ == "__main__":
导入 asyncio
asyncio.run(main()) #10
###输出
函数名称:get_top_tv_shows_by_genre #11
论点:
self = <skills.Movies.tmdb.TMDbService object at 0x00000159F52090C0>
类型 = 动作
函数名:get_tv_show_genre_id #11
论点:
self = <skills.Movies.tmdb.TMDbService object at 0x00000159F52090C0>
类型名称 = 动作
《奥术》、《海贼王》、《瑞克和莫蒂》、《降世神通:最后的气宗》、《钢之炼金术师》
炼金术师FA、鬼灭之刃、无敌铁拳
进击的巨人、我的英雄学院、斗志、猫头鹰小屋import semantic_kernel as sk
from plugins.Movies.tmdb import TMDbService
async def main():
kernel = sk.Kernel() #1
tmdb_service = kernel.import_plugin_from_object ↪
↪ (TMDbService(), "TMDBService") #2
print(
await tmdb_service["get_movie_genre_id"](
kernel, sk.KernelArguments(
genre_name="action") #3
)
) #4
print(
await tmdb_service["get_tv_show_genre_id"](
kernel, sk.KernelArguments(
genre_name="action") #5
)
) #6
print(
await tmdb_service["get_top_movies_by_genre"](
kernel, sk.KernelArguments(
genre_name="action") #7
)
) #8
print(
await tmdb_service["get_top_tv_shows_by_genre"](
kernel, sk.KernelArguments(
genre_name="action") #7
)
)
print(await tmdb_service["get_movie_genres"](
kernel, sk.KernelArguments())) #9
print(await tmdb_service["get_tv_show_genres"](
kernel, sk.KernelArguments())) #9
# Run the main function
if __name__ == "__main__":
import asyncio
asyncio.run(main()) #10
###Output
Function name: get_top_tv_shows_by_genre #11
Arguments:
self = <skills.Movies.tmdb.TMDbService object at 0x00000159F52090C0>
genre = action
Function name: get_tv_show_genre_id #11
Arguments:
self = <skills.Movies.tmdb.TMDbService object at 0x00000159F52090C0>
genre_name = action
Arcane, One Piece, Rick and Morty, Avatar: The Last Airbender, Fullmetal
Alchemist: Brotherhood, Demon Slayer: Kimetsu no Yaiba, Invincible,
Attack on Titan, My Hero Academia, Fighting Spirit, The Owl House
SK 的真正强大之处在这个测试中得以展现。请注意,该类TMDbService是以插件的形式导入的,但我们无需定义任何插件配置,只需进行一些必要的设置即可。我们仅通过编写一个封装了几个 API 函数的类,就实现了 TMDB API 的部分语义化暴露。现在,这些函数已经暴露出来,我们可以在下一节中探讨如何将它们用作聊天界面的插件。
The real power of SK is shown in this test. Notice how the TMDbService class is imported as a plugin, but we don’t have to define any plugin configurations other than what we already did? By just writing one class that wrapped a few API functions, we’ve exposed part of the TMDB API semantically. Now, with the functions exposed, we can look at how they can be used as plugins for a chat interface in the next section.
在TMDB功能语义化公开之后,我们可以着手将其集成到聊天界面中。这样一来,我们就可以在聊天界面中进行自然流畅的对话,获取各种信息,例如当前的热门电影。
With the TMDB functions exposed semantically, we can move on to integrating them into a chat interface. This will allow us to converse naturally in this interface to get various information, such as current top movies.
在 VS Code 中打开SK_service_chat.py。向下滚动到创建函数的新代码段的开头,如清单 5.21 所示。这里创建的函数现在作为插件公开,但我们过滤掉了聊天函数,因为我们不想将其作为插件公开。聊天函数允许用户直接与 LLM 对话,因此不应该作为插件公开。
Open SK_service_chat.py in VS Code. Scroll down to the start of the new section of code that creates the functions, as shown in listing 5.21. The functions created here are now exposed as plugins, except we filter out the chat function, which we don’t want to expose as a plugin. The chat function here allows the user to converse directly with the LLM and shouldn’t be a plugin.
SK_service_chat.py(功能设置)系统消息 = "你是一个有用的AI助手。"
tmdb_service = kernel.import_plugin_from_object(
TMDbService(), "TMDBService") #1
# 提取的代码段
execution_settings = sk_oai.OpenAIChatPromptExecutionSettings(
service_id=service_id,
ai_model_id=model_id,
max_tokens=2000,
温度=0.7,
top_p=0.8,
tool_choice="auto",
tools=get_tool_call_object(
内核,{"exclude_plugin": ["ChatBot"]}) #2
)
prompt_config = sk.PromptTemplateConfig.from_completion_parameters(
max_tokens=2000,
温度=0.7,
top_p=0.8,
function_call="auto",
chat_system_prompt=system_message,
) #3
提示模板 = OpenAIChatPromptTemplate(
"{{$user_input}}", kernel.prompt_template_engine, prompt_config
) #4
history = ChatHistory()
history.add_system_message("您推荐电影和电视节目。")
history.add_user_message("你好,请问你是谁?")
history.add_assistant_message(
我是推荐聊天机器人鲁迪。我正在努力弄清楚什么
人们需要。
) #5
chat_function = kernel.create_function_from_prompt(
prompt_template_config=prompt_template,
plugin_name="聊天机器人",
函数名="聊天",
) #6system_message = "You are a helpful AI assistant."
tmdb_service = kernel.import_plugin_from_object(
TMDbService(), "TMDBService") #1
# extracted section of code
execution_settings = sk_oai.OpenAIChatPromptExecutionSettings(
service_id=service_id,
ai_model_id=model_id,
max_tokens=2000,
temperature=0.7,
top_p=0.8,
tool_choice="auto",
tools=get_tool_call_object(
kernel, {"exclude_plugin": ["ChatBot"]}), #2
)
prompt_config = sk.PromptTemplateConfig.from_completion_parameters(
max_tokens=2000,
temperature=0.7,
top_p=0.8,
function_call="auto",
chat_system_prompt=system_message,
) #3
prompt_template = OpenAIChatPromptTemplate(
"{{$user_input}}", kernel.prompt_template_engine, prompt_config
) #4
history = ChatHistory()
history.add_system_message("You recommend movies and TV Shows.")
history.add_user_message("Hi there, who are you?")
history.add_assistant_message(
"I am Rudy, the recommender chat bot. I'm trying to figure out what
people need."
) #5
chat_function = kernel.create_function_from_prompt(
prompt_template_config=prompt_template,
plugin_name="ChatBot",
function_name="Chat",
) #6
接下来,我们可以继续滚动浏览同一文件中的聊天功能,如下表所示。
Next, we can continue by scrolling in the same file to review the chat function, as shown in the following listing.
SK_service_chat.py(聊天功能)async def chat() -> bool:
尝试:
user_input = input("用户:> ") #1
除了键盘中断:
print("\n\n正在退出聊天...")
返回 False
除 EOFError 外:
print("\n\n正在退出聊天...")
返回 False
如果 user_input == "exit": #2
print("\n\n正在退出聊天...")
返回 False
arguments = sk.KernelArguments( #3
用户输入=用户输入,
history=("\n").join(
[f"{msg.role}: {msg.content}" for msg in history])
)
result = await chat_completion_with_tool_call( #4
内核=内核,
参数=参数,
chat_plugin_name="聊天机器人",
chat_function_name="聊天",
chat_history=历史记录,
)
print(f"AI Agent:> {result}")
返回 Trueasync def chat() -> bool:
try:
user_input = input("User:> ") #1
except KeyboardInterrupt:
print("\n\nExiting chat...")
return False
except EOFError:
print("\n\nExiting chat...")
return False
if user_input == "exit": #2
print("\n\nExiting chat...")
return False
arguments = sk.KernelArguments( #3
user_input=user_input,
history=("\n").join(
[f"{msg.role}: {msg.content}" for msg in history]),
)
result = await chat_completion_with_tool_call( #4
kernel=kernel,
arguments=arguments,
chat_plugin_name="ChatBot",
chat_function_name="Chat",
chat_history=history,
)
print(f"AI Agent:> {result}")
return True
最后,向下滚动到文件底部,查看主函数。这段代码循环调用聊天函数。
Lastly, scroll down to the bottom of the file, and review the primary function. This is the code that calls the chat function in a loop.
SK_service_chat.py(主要功能)async def main() -> None:
聊天 = 真
context = kernel.create_new_context()
print("欢迎使用您的第一个 AI 代理\" #1
输入“exit”退出。
请提供一份按类型分类的当前上映电影列表。
)
聊天时: #2
聊天,上下文 = await chat(上下文) #3
如果 __name__ == "__main__":
asyncio.run(main())async def main() -> None:
chatting = True
context = kernel.create_new_context()
print("Welcome to your first AI Agent\ #1
\n Type 'exit' to exit.\
\n Ask to get a list of currently playing movies by genre."
)
while chatting: #2
chatting, context = await chat(context) #3
if __name__ == "__main__":
asyncio.run(main())
运行聊天界面,运行文件(F5),然后询问特定类型的电影或电视节目。示例对话如清单 5.24 所示。此输出显示了列出两种类型电影的请求如何导致聊天界面多次调用该get_top_movie_by_genre函数。
Run the chat interface, run the file (F5), and then ask about movies or television shows of a particular genre. An example conversation session is shown in listing 5.24. This output shows how a request to list movies from two genres made the chat interface make multiple calls to the get_top_movie_by_genre function.
SK_service_chat.py(对话示例)欢迎来到您的第一个人工智能代理。 输入“exit”退出。 请提供一份按类型分类的当前上映电影列表。 用户:> 输入:请问能否提供当前最热门电影的列表? 动作片和喜剧片? 函数名:get_top_movies_by_genre #1 论点: 类型 = 动作 函数名称:get_movie_genre_id #2 论点: 类型名称 = 动作 函数名:get_top_movies_by_genre #1 论点: 类型 = 喜剧 函数名:get_movie_genre_id #2 论点: 类型名称 = 喜剧 经纪人:以下是目前最热门的电影 动作片和喜剧片类型: **行动:** #3 1.《饥饿游戏:鸣鸟与蛇之歌》 2. 叛逆之月 - 第一部分:火之子 3.《海王与失落的王国》 4. 平安夜 5. 家庭计划 6. 自由职业 7. 移民 8. 自由之声 9. 哥斯拉减一 **喜剧:** #4 1. 家庭计划 2. 旺卡 3. 自由职业 4. 索尔特本 5.《小鸡快跑:金块的黎明》 6. 巨魔们团结起来 7. 谷仓里有东西 8. 移民 请注意,有些电影可能同时属于这两种类型,例如: 《家庭计划》和《自由职业》。
Welcome to your first AI Agent Type 'exit' to exit. Ask to get a list of currently playing movies by genre. User:> Input: can you give me list of the current top playing movies for the action and comedy genres? Function name: get_top_movies_by_genre #1 Arguments: genre = action Function name: get_movie_genre_id #2 Arguments: genre_name = action Function name: get_top_movies_by_genre #1 Arguments: genre = comedy Function name: get_movie_genre_id #2 Arguments: genre_name = comedy Agent:> Here are the current top-playing movies for the action and comedy genres: **Action:** #3 1. The Hunger Games: The Ballad of Songbirds & Snakes 2. Rebel Moon - Part One: A Child of Fire 3. Aquaman and the Lost Kingdom 4. Silent Night 5. The Family Plan 6. Freelance 7. Migration 8. Sound of Freedom 9. Godzilla Minus One **Comedy:** #4 1. The Family Plan 2. Wonka 3. Freelance 4. Saltburn 5. Chicken Run: Dawn of the Nugget 6. Trolls Band Together 7. There's Something in the Barn 8. Migration Please note that some movies may overlap in both genres, such as "The Family Plan" and "Freelance ."
务必探索聊天界面的功能范围,以及您可以从 TMDB 服务中获取哪些信息。例如,您可以尝试查询电影或电视剧的类型列表。这项服务是一个不错的尝试,但我们或许可以做得更好,我们将在下一节中看到这一点。
Be sure to explore the chat interface’s boundaries and what you can ask for from the TMDB service. For example, try asking for a list of genres for movies or television shows. This service is a good first try, but we can perhaps do better, as we’ll see in the next section.
现在我们已经看到了将 API 转换为语义服务接口的绝佳示例。目前,这些函数返回当前正在播放的热门电影和电视剧的标题。然而,仅仅返回标题会限制 LLM 自行解析结果的能力。
Now we’ve seen an excellent demonstration of converting an API into a semantic service interface. As it is, the functions return the titles of the top movies and television shows currently playing. However, by just returning the titles, we’re limiting the ability of the LLM to parse the results on its own.
因此,我们将创建一个 v2 版本TMDbService来修正这个问题,并将结果以 JSON 字符串的形式返回。tmdb_v2.py在 VS Code 中打开该文件,并向下滚动找到该get_top_movies_by_genre函数。
Therefore, we’ll create a v2 version of TMDbService to correct this and return the results as JSON strings. Open the file tmdb_v2.py in VS Code, and scroll down to the get_top_movies_by_genre function.
tmdb_v2.py ( get_top_movies_by_genre)def get_top_movies_by_genre(self, genre: str) -> str:
print_function_call()
genre_id = self.get_movie_genre_id(genre)
如果 genre_id:
相同的代码……
返回 json.dumps(filtered_movies) #1
别的:
返回 ””def get_top_movies_by_genre(self, genre: str) -> str:
print_function_call()
genre_id = self.get_movie_genre_id(genre)
if genre_id:
#same code …
return json.dumps(filtered_movies) #1
else:
return ""
现在SK_service_chat.py在 VS Code 中打开,注释掉清单 5.26 中所示的那一行代码,然后再取消注释。这样就会使用版本 2,TMDbService将结果作为完整的 JSON 文档输出到单个字符串中。
Now open SK_service_chat.py in VS Code, and comment and uncomment the line shown in listing 5.26. This will then use version 2 of the TMDbService that outputs results as full JSON documents in a single string.
SK_service_chat.py(修改导入)#from skills.Movies.tmdb import TMDbService #1 from skills.Movies.tmdb_v2 import TMDbService #2
#from skills.Movies.tmdb import TMDbService #1 from skills.Movies.tmdb_v2 import TMDbService #2
在 VS Code 中重新运行该SK_service_chat.py文件,并稍微修改查询,如下面的输出所示。
Rerun the SK_service_chat.py file in VS Code, and alter your query slightly, as shown by the output in the following listing.
SK_service_chat.py(TMDb_v2服务输出)用户:> 获取当前正在播放的电影列表 动作类型,而且只回归太空题材电影。 #1 代理人:> 查找当前正在上映的特定动作电影 关于空间方面,我需要手动筛选提供的列表。 那些概述中包含太空相关主题的网站。以下网站符合这些条件。 列表中的标准: 1. **叛逆之月 - 第一部:火之子** #2 - 发布日期:2023年12月15日 概述:当银河系边缘的一个和平殖民地发现…… 它本身也受到暴君摄政王巴利萨留军队的威胁, 他们派遣一位名叫科拉的年轻女子,她有着神秘的过去,去寻找…… 他们从邻近星球调集战士来帮助他们抵抗。 这是所提供列表中唯一一部明确提到“a”的电影。 概述中涉及太空主题。
User:> get a list of currently playing movies for the action genre and only return movies about space #1 Agent:> To find currently playing action movies that are specifically about space, I will need to manually filter the provided list for those that have space-related themes in their overview. Here's what fits that criteria from the list: 1. **Rebel Moon - Part One: A Child of Fire** #2 - Release Date: 2023-12-15 - Overview: When a peaceful colony on the edge of the galaxy finds itself threatened by the armies of the tyrannical Regent Balisarius, they dispatch Kora, a young woman with a mysterious past, to seek out warriors from neighboring planets to help them take a stand. This is the only movie from the provided list that clearly mentions a space-related theme in its overview. …
由于语义服务函数现在以 JSON 格式返回完整的电影列表,LLM 可以应用额外的过滤条件。这正是语义服务的真正强大之处,它允许您通过 LLM 处理数据。仅仅返回电影标题列表是无法体现这种强大功能的。
Because the semantic service functions now return the complete movie listing in JSON, the LLM can apply additional filtering. This is the real power of semantic services, allowing you to process the data through the LLM. We won’t see this power by just returning a list of titles.
最后一个练习展示了编写语义服务层时需要转变的思维方式。通常情况下,你会希望返回尽可能多的信息。返回更多信息可以利用 LLM 的独立过滤、排序和转换数据的功能。下一章,我们将探讨如何使用行为树构建自主代理。
This last exercise demonstrated the change in mentality you need to make when writing semantic service layers. Generally, you’ll typically want to return as much information as possible. Returning more information takes advantage of the LLM abilities to filter, sort, and transform data independently. In the next chapter, we’ll explore building autonomous agents using behavior trees.
完成以下练习以加深你对相关内容的理解:
Complete the following exercises to improve your knowledge of the material:
目标 ——熟悉如何为 OpenAI 聊天补全 API 创建一个简单的插件。
Objective —Familiarize yourself with creating a simple plugin for the OpenAI chat completions API.
任务:
Tasks:
目标 ——学习创建一个执行独特任务的插件。
Objective —Learn to create a plugin that performs a unique task.
任务:
Tasks:
目标 ——探索语义函数的创建。
Objective —Explore the creation of semantic functions.
任务:
Tasks:
目标 ——了解如何结合语义功能和原生功能。
Objective —Understand how to combine semantic and native functions.
任务:
Tasks:
目标 ——学习如何将现有的 Web API 封装成语义服务插件。
Objective —Learn to wrap existing web APIs as semantic service plugins.
任务:
Tasks:
既然我们已经了解了行动如何扩展智能体的能力,接下来就可以看看行为树如何指导智能体系统。我们将首先了解行为树的基础知识,以及它们如何在游戏中控制机器人和人工智能。
Now that we’ve covered how actions extend the power/capabilities of agents, we can look at how behavior trees can guide agentic systems. We’ll start by understanding the basics of behavior trees and how they control robotics and AI in games.
我们将回到智能体行为,并探讨如何使用 GPT Assistants Playground 项目在 OpenAI Assistants 平台上实现智能体行为。接下来,我们将研究如何使用 OpenAI Assistants 构建自主智能体行为树 (ABT)。然后,我们将进一步了解自主智能体中控制和防护机制的必要性,以及如何使用控制屏障函数。
We’ll return to agentic actions and examine how actions can be implemented on the OpenAI Assistants platform using the GPT Assistants Playground project. From there, we’ll look at how to build an autonomous agentic behavior tree (ABT) using OpenAI assistants. Then, we’ll move on to understanding the need for controls and guardrails on autonomous agents and using control barrier functions.
在本章最后一节中,我们将探讨如何使用 AgentOps 平台来监控我们自主的行为驱动型智能体系统。这将是一个充满挑战且引人入胜的章节。让我们先进入下一节,该节将介绍行为树。
In the final section of the chapter, we’ll examine the use of the AgentOps platform to monitor our autonomous behavior-driven agentic systems. This will be an exciting chapter with several challenges. Let’s begin by jumping into the next section, which introduces behavior trees.
行为树是一种由来已久的模式,用于控制游戏中的机器人和人工智能。Rodney A. Brooks 于 1986 年在其论文《移动机器人的鲁棒分层控制系统》中首次提出了这一概念。这为我们今天所使用的树状和节点结构模式奠定了基础,并在此基础上进行了扩展。
Behavior trees are a long-established pattern used to control robotics and AI in games. Rodney A. Brooks first introduced the concept in his “A Robust Layered Control System for a Mobile Robot” paper in 1986. This laid the groundwork for a pattern that expanded on using the tree and node structure we have today.
如果你玩过带有非玩家角色(NPC)的电脑游戏,或者与高级机器人系统互动过,那么你一定见过行为树的运作。图 6.1 展示了一个简单的行为树。这棵树包含了所有主要节点:选择器或回退节点、序列节点、动作节点和条件节点。
If you’ve ever played a computer game with nonplayer characters (NPCs) or interacted with advanced robotic systems, you’ve witnessed behavior trees at work. Figure 6.1 shows a simple behavior tree. The tree represents all the primary nodes: selector or fallback nodes, sequence nodes, action nodes, and condition nodes.
表 6.1 描述了本书将要探讨的主要节点的功能和用途。还有其他节点和节点类型,甚至可以创建自定义节点,但现在,我们将重点关注表中列出的节点。
Table 6.1 describes the functions and purpose of the primary nodes we’ll explore in this book. There are other nodes and node types, and you can even create custom nodes, but for now, we’ll focus on those in the table.
|
节点
|
目的
|
功能
|
类型
|
|---|---|---|---|
| 选择器(备用) |
这个节点的工作原理是选择第一个成功完成的子节点。它通常被称为回退节点,因为它总是会回退到最后一个成功执行的节点。 |
该节点按顺序调用其子节点,并在第一个子节点成功时停止执行。当子节点成功时,它将返回成功;如果没有节点成功,则返回失败。 |
合成的 |
| 顺序 |
该节点按顺序执行其所有子节点,直到某个节点失败或所有子节点全部成功完成为止。 |
该节点会按顺序调用其所有子节点,无论调用成功还是失败。如果所有子节点都成功,则返回成功;如果只有一个子节点失败,则返回失败。 |
合成的 |
| 健康)状况 |
行为树不使用布尔逻辑,而是以成功或失败作为控制手段。条件为真时返回成功,否则返回假。 |
该节点根据条件返回成功或失败。 |
任务 |
| 行动 |
这里是行动发生的地方。 |
节点执行后,如果成功则返回成功,否则返回失败。 |
任务 |
| 装饰师 |
它们通过控制子节点的执行来工作。它们通常被称为条件语句,因为它们可以判断一个节点是否值得执行或执行是否安全。 |
节点控制子节点的执行。装饰器可以作为控制屏障函数,阻止或防止不必要的行为。 |
装饰师 |
| 平行线 |
该节点并行执行其所有子节点。成功或失败由一个阈值控制,该阈值表示成功所需的子节点数量。 |
该节点会按顺序执行其所有子节点,而不管这些子节点的状态如何。 |
合成的 |
表 6.1 中的主要节点足以应对众多用例。然而,理解行为树一开始可能会让人望而生畏。只有真正开始使用它们,你才能体会到它们内在的复杂性。在构建一些简单的行为树之前,我们将在下一节中更详细地探讨其执行过程。
The primary nodes in table 6.1 can provide enough functionality to handle numerous use cases. However, understanding behavior trees initially can be daunting. You won’t appreciate their underlying complexity until you start using them. Before we build some simple trees, we want to look at execution in more detail in the next section.
理解行为树的执行方式对于设计和实现行为树至关重要。与计算机科学中的大多数概念不同,行为树以成功或失败为导向。当行为树中的一个节点执行时,它会返回成功或失败的结果;这甚至适用于条件节点和选择器节点。
Understanding how behavior trees execute is crucial to designing and implementing behavior trees. Unlike most concepts in computer science, behavior trees operate in terms of success and failure. When a node in a behavior tree executes, it will return either success or failure; this even applies to conditions and selector nodes.
行为树的执行顺序是从上到下、从左到右。图 6.2 展示了整个过程,以及节点执行成功或失败时的处理结果。在这个例子中,由行为树控制的 AI 有一个苹果,但没有梨。在第一个序列节点中,一个条件检查 AI 是否拥有苹果。由于 AI 没有苹果,它中止该序列并回退到选择器。选择器随后选择其下一个子节点(另一个序列),该序列检查 AI 是否拥有梨。由于 AI 拥有梨,因此 AI 吃掉了苹果。
Behavior trees execute from top to bottom and left to right. Figure 6.2 shows the process and what happens if a node fails or succeeds. In the example, the AI the tree controls has an apple but no pear. In the first sequence node, a condition checks if the AI has an apple. Because the AI doesn’t have an apple, it aborts the sequence and falls back to the selector. The selector then selects its next child node, another sequence, that checks if the AI has a pear, and because it does, the AI eats the apple.
行为树能够控制人工智能系统在宏观和微观层面的执行方式。在机器人领域,行为树通常设计用于微观层面,其中每个动作或条件都是一个独立的小事件,例如检测到苹果。相反,行为树也可以控制更宏观的系统,例如游戏中的NPC,其中每个动作可能是一系列事件的组合,例如攻击玩家。
Behavior trees provide control over how an AI system will execute at a macro or micro level. Regarding robotics, behavior trees will typically be designed to operate at the micro level, where each action or condition is a small event, such as detecting the apple. Conversely, behavior trees can also control more macro systems, such as NPCs in games, where each action may be a combination of events, like attacking the player.
对于智能体系统而言,行为树支持在您选择的层级控制智能体或助手。我们将探讨在任务层级控制智能体,并在后续章节中探讨在规划层级控制智能体。毕竟,借助逻辑层模型(LLM)的强大功能,智能体可以构建自己的行为树。
For agentic systems, behavior trees support controlling an agent or assistant at your chosen level. We’ll explore controlling agents at the task and, in later chapters, the planning levels. After all, with the power of LLMs, agents can construct their own behavior tree.
当然,还有其他几种人工智能控制方式可以用来控制智能体系统。下一节将探讨这些不同的系统,并将它们与行为树进行比较。
Of course, several other forms of AI control could be used to control agentic systems. The next section will examine those different systems and compare them to behavior trees.
许多其他人工智能控制系统也各有优势,值得在控制智能体系统方面进行探索。它们可以展示行为树的优势,并为特定用例提供其他选择。行为树是一种优秀的模式,但并非唯一选择,了解其他模式也很有价值。
Numerous other AI control systems have benefits and are worth exploring in controlling agentic systems. They can demonstrate the benefits of behavior trees and provide other options for specific use cases. The behavior tree is an excellent pattern, but it isn’t the only one, and it’s worth learning about others.
表 6.2 列出了我们可用于控制人工智能系统的其他几种系统。表中每一项都描述了该方法的功能、不足之处以及其在智能体人工智能控制中的潜在应用。
Table 6.2 highlights several other systems we may consider for controlling AI systems. Each item in the table describes what the method does, its shortcomings, and its possible application to agentic AI control.
|
控件名称
|
描述
|
缺点
|
控制智能体人工智能?
|
|---|---|---|---|
| 有限状态机( FSM) |
有限状态机(FSM)使用由事件或条件触发的一组状态和转换来模拟人工智能。 |
随着复杂性的增加,有限状态机可能会变得难以管理。 |
有限状态机对于智能体来说并不实用,因为它们扩展性不好。 |
| 决策树b |
决策树使用树状模型来表示决策及其可能的后果。 |
决策树在复杂场景下可能出现过拟合和泛化能力不足的问题。 |
决策树可以通过行为树进行调整和增强。 |
| 基于效用的系统b |
效用函数根据当前情况评估并选择最佳行动方案。 |
这些系统需要精心设计效用功能,以平衡各项优先级。 |
这种模式可以应用于行为树中。 |
| 基于规则的系统 |
这组“如果……那么……”规则定义了人工智能的行为。 |
这些系统规则繁多,可能会变得臃肿不堪,从而导致潜在的冲突。 |
当与由 LLM 驱动的代理系统结合使用时,这些方法不太实用。 |
| 规划系统c |
规划系统利用规划算法生成一系列行动方案,以实现特定目标。 |
这些系统计算成本高昂,并且需要丰富的领域知识。 |
正如我们将在后面的章节中看到的那样,智能体已经可以自行实现这些模式。 |
| 行为克隆c |
行为克隆是指通过模仿专家示范来学习策略。 |
该系统可能难以推广到未曾见过的情况。 |
这可以融入行为树或特定任务中。 |
| 层级任务网络(HTN)d |
HTN 将任务分解成更小、更易于管理的子任务,并按层次结构排列。 |
这些任务规模非常大,管理和设计起来都很复杂。 |
HTN(高层次网络)能够更好地组织和执行复杂任务。这种模式可用于更大型的智能体系统。 |
| 黑板系统b |
这些系统采用协作式问题解决机制,不同子系统使用共享黑板进行协作。 |
这些系统难以实施,且子系统之间的通信难以管理。 |
智能体系统可以使用对话或群聊/讨论串来实现类似的模式。 |
| 遗传算法(GA)d |
这些优化技术受到自然选择的启发,旨在通过进化来解决问题。 |
遗传算法计算量大,而且不一定总能找到最优解。 |
遗传算法具有潜力,甚至可以用来优化行为树。 |
| a . 考虑复杂智能体系统时不实用;b .存在于行为树中或易于整合; c.通常应用于任务或动作/条件级别 ;d.应用于智能体时,高级系统需要大量工作。 |
|||
本书后续章节将探讨表 6.2 中讨论的一些模式。总的来说,许多模式都可以以行为树为基础进行增强或整合。虽然其他模式(例如有限状态机)可能对小型实验有所帮助,但它们缺乏行为树那样的可扩展性。
In later chapters of this book, we’ll investigate some of the patterns discussed in table 6.2. Overall, several patterns can be enhanced or incorporated using behavior trees as the base. While other patterns, such as FSMs, may be helpful for small experiments, they lack the scalability of behavior trees.
行为树作为人工智能控制系统具有诸多优势,包括可扩展性。以下列举了使用行为树的其他显著优势:
Behavior trees can provide several benefits as an AI control system, including scalability. The following list highlights other notable benefits of using behavior trees:
既然我们已经充分论证了行为树的优势,现在就应该考虑如何在代码中实现它们。下一节,我们将学习如何使用 Python 代码构建一个简单的行为树。
Having made a strong case for behavior trees, we should now consider how to implement them in code. In the next section, we look at how to build a simple behavior tree, using Python code.
由于行为树技术已经存在很长时间,并且被集成到许多技术中,因此创建示例演示非常简单。当然,最简单的方法是询问 ChatGPT 或您常用的 AI 聊天工具。清单 6.1 展示了使用提示生成代码示例并将图 6.1 作为示例树提交的结果。最终代码需要修正一些简单的命名和参数错误。
Because behavior trees have been around for so long and have been incorporated into many technologies, creating a sample demonstration is very simple. Of course, the easiest way is to ask ChatGPT or your favorite AI chat tool. Listing 6.1 shows the result of using a prompt to generate the code sample and submitting figure 6.1 as the example tree. The final code had to be corrected for simple naming and parameter errors.
注意: 本章的所有代码均可通过下载 GPT Assistants Playground 项目找到,网址为https://mng.bz/Ea0q。
Note All the code for this chapter can be found by downloading the GPT Assistants Playground project at https://mng.bz/Ea0q.
first_btree.py 导入 py_trees
class HasApple(py_trees.behaviour.Behaviour): #1
def __init__(self, name):
super(HasApple, self).__init__(name)
def update(self):
如果为真:
返回 py_trees.common.Status.SUCCESS
别的:
返回 py_trees.common.Status.FAILURE
其他课程已省略……
has_apple = HasApple(name="Has apple") #2
eat_apple = EatApple(name="吃苹果") #2
sequence_1 = py_trees.composites.Sequence(name="Sequence 1", memory=True)
sequence_1.add_children([has_apple, eat_apple]) #3
has_pear = HasPear(name="有梨") #4
eat_pear = EatPear(name="吃梨") #4
sequence_2 = py_trees.composites.Sequence(name="Sequence 2", memory=True)
sequence_2.add_children([has_pear, eat_pear]) #3
root = py_trees.composites.Selector(name="Selector", memory=True)
root.add_children([sequence_1, sequence_2]) #3
behavior_tree = py_trees.trees.BehaviourTree(root) #5
py_trees.logging.level = py_trees.logging.Level.DEBUG
for i in range(1, 4): #6
print("\n------------------ 勾选 {0} ------------------".format(i))
behavior_tree.tick() #6
### 输出开始
------------------ 勾选 1 ------------------
[调试] 选择器:Selector.tick()
[调试] 选择器:Selector.tick() [!RUNNING->重置当前子项]
[调试] 序列 1:Sequence.tick()
[调试] 有苹果:HasApple.tick()
[调试] Has apple : HasApple.stop(Status.INVALID->Status.SUCCESS)
[调试] 吃苹果:EatApple.tick()
吃苹果
[调试] 吃苹果:EatApple.stop(Status.INVALID->Status.SUCCESS)
[调试] 序列 1:Sequence.stop()[Status.INVALID->Status.SUCCESS]import py_trees
class HasApple(py_trees.behaviour.Behaviour): #1
def __init__(self, name):
super(HasApple, self).__init__(name)
def update(self):
if True:
return py_trees.common.Status.SUCCESS
else:
return py_trees.common.Status.FAILURE
# Other classes omitted…
has_apple = HasApple(name="Has apple") #2
eat_apple = EatApple(name="Eat apple") #2
sequence_1 = py_trees.composites.Sequence(name="Sequence 1", memory=True)
sequence_1.add_children([has_apple, eat_apple]) #3
has_pear = HasPear(name="Has pear") #4
eat_pear = EatPear(name="Eat pear") #4
sequence_2 = py_trees.composites.Sequence(name="Sequence 2", memory=True)
sequence_2.add_children([has_pear, eat_pear]) #3
root = py_trees.composites.Selector(name="Selector", memory=True)
root.add_children([sequence_1, sequence_2]) #3
behavior_tree = py_trees.trees.BehaviourTree(root) #5
py_trees.logging.level = py_trees.logging.Level.DEBUG
for i in range(1, 4): #6
print("\n------------------ Tick {0} ------------------".format(i))
behavior_tree.tick() #6
### Start of output
------------------ Tick 1 ------------------
[DEBUG] Selector : Selector.tick()
[DEBUG] Selector : Selector.tick() [!RUNNING->reset current_child]
[DEBUG] Sequence 1 : Sequence.tick()
[DEBUG] Has apple : HasApple.tick()
[DEBUG] Has apple : HasApple.stop(Status.INVALID->Status.SUCCESS)
[DEBUG] Eat apple : EatApple.tick()
Eating apple
[DEBUG] Eat apple : EatApple.stop(Status.INVALID->Status.SUCCESS)
[DEBUG] Sequence 1 : Sequence.stop()[Status.INVALID->Status.SUCCESS]
清单 6.1 中的代码表示图 6.1 中的行为树。您可以直接运行此代码,也可以修改条件返回值,然后再次运行该行为树。您还可以通过从根选择器中移除一个序列节点来更改行为树。
The code in listing 6.1 represents the behavior tree in figure 6.1. You can run this code as is or alter what the conditions return and then run the tree again. You can also change the behavior tree by removing one of the sequence nodes from the root selector.
现在我们已经对行为树有了基本的了解,接下来就可以开始使用智能体/助手了。在此之前,我们将介绍一个帮助我们使用 OpenAI 助手的工具。这个工具将帮助我们把最初的智能体行为树(ABT)集成到 OpenAI 助手上。
Now that we have a basic understanding of behavior trees, we can move on to working with agents/assistants. Before doing that, we’ll look at a tool to help us work with OpenAI Assistants. This tool will help us wrap our first ABTs around OpenAI Assistants.
为了编写本书,我们创建了多个 GitHub 项目,分别探讨构建智能体和助手的各个方面。其中一个项目名为 GPT Assistants Playground,它使用 Gradio 构建界面,模仿了 OpenAI Assistants Playground,但添加了一些额外功能。
For the development of this book, several GitHub projects were created to address various aspects of building agents and assistants. One such project, the GPT Assistants Playground, is built using Gradio for the interface that mimics the OpenAI Assistants Playground but with several extras added.
Playground 项目旨在作为教学和演示辅助工具。项目内部的 Python 代码使用 OpenAI Assistants API 创建聊天界面和代理系统,用于构建和驱动各种助手。此外,它还提供了一系列丰富的助手操作示例供您使用,您也可以轻松添加自己的操作。
The Playground project was developed as both a teaching and demonstration aid. Inside the project, the Python code uses the OpenAI Assistants API to create a chat interface and an agentic system to build and power assistants. There is also a comprehensive collection of actions assistants you can use, and you can easily add your own actions.
以下代码展示了如何从终端安装并运行 Playground 项目。目前没有 PyPI 包可供安装。
The following listing shows installing and running the Playground project from the terminal. There is currently no PyPI package to install.
# 切换到工作文件夹并创建一个新的 Python 虚拟环境
git 克隆
↪ https://github.com/cxbxmxcx/GPTAssistantsPlayground #1
cd GPTAssistantsPlayground #2
pip install -r requirements.txt #3# change to a working folder and create a new Python virtual environment
git clone
↪ https://github.com/cxbxmxcx/GPTAssistantsPlayground #1
cd GPTAssistantsPlayground #2
pip install -r requirements.txt #3
您可以从终端或使用 Visual Studio Code (VS Code) 运行应用程序,后者可提供更多控制选项。运行应用程序之前,您需要通过命令行或创建文件.env(我们之前已多次演示过)来设置 OpenAI API 密钥。清单 6.3 展示了在 Linux/Mac 或 Git Bash shell(推荐使用 Windows)上设置环境变量并运行应用程序的示例。
You can run the application from the terminal or using Visual Studio Code (VS Code), with the latter giving you more control. Before running the application, you need to set your OpenAI API key through the command line or by creating an .env file, as we’ve done a few times already. Listing 6.3 shows an example of setting the environment variable on Linux/Mac or the Git Bash shell (Windows recommended) and running the application.
export OPENAI_API_KEY="你的api密钥" #1 python main.py #2
export OPENAI_API_KEY="your-api-key" #1 python main.py #2
打开浏览器访问显示的 URL(通常为http://127.0.0.1:7860)或终端中提到的内容。您将看到一个类似于图 6.3 所示的界面。如果您已经定义了 OpenAI 助手,您将在“选择助手”下拉菜单中看到它们。
Open your browser to the URL displayed (typically http://127.0.0.1:7860) or what is mentioned in the terminal. You’ll see an interface similar to that shown in figure 6.3. If you’ve already defined the OpenAI Assistants, you’ll see them in the Select Assistant dropdown.
如果您从未定义过助手,您可以创建一个,并选择所需的各种选项和指令。如果您访问过 OpenAI Playground,您应该已经体验过类似的界面。
If you’ve never defined an assistant, you can create one and choose the various options and instructions you need. If you’ve visited the OpenAI Playground, you’ve already experienced a similar interface.
接下来的几节将更详细地介绍这些特性。下一节我们将首先探讨如何使用和消费操作。
Each of these features is covered in more detail over the next few sections. We’ll start with a look at using and consuming actions in the next section.
动作和工具是赋予智能体和助手功能的基石。如果没有工具,智能体就只是毫无功能的聊天机器人。正如我们在第三章中所看到的,OpenAI 平台在构建许多工具模式方面处于领先地位。
Actions and tools are the building blocks that empower agents and assistants. Without access to tools, agents are functionless chatbots. The OpenAI platform is a leader in establishing many of the patterns for tools, as we saw in chapter 3.
Playground 提供了多种自定义操作,可以通过界面将其附加到助手上。在接下来的练习中,我们将构建一个简单的助手,并附加几个自定义操作,看看有哪些功能。
The Playground provides several custom actions that can be attached to assistants through the interface. In this next exercise, we’ll build a simple assistant and attach a couple of custom actions to see what is possible.
图 6.4 显示了展开的“操作”折叠面板,其中显示了许多可用的自定义操作。从终端或调试器运行 Playground,并创建一个新的助手。然后,选择图中所示的操作。选择完操作后,滚动到底部,然后单击“添加助手”以添加助手。助手必须先创建才能使用。
Figure 6.4 shows the expanded Actions accordion, which displays many available custom actions. Run the Playground from the terminal or debugger, and create a new assistant. Then, select the actions shown in the figure. After you’re done selecting the actions, scroll to the bottom, and click Add Assistant to add the assistant. Assistants need to be created before they can be used.
创建助手后,您可以让它列出所有可用的助手。列出助手后,您还会获得调用助手所需的 ID。您还可以调用其他助手,并让他们完成其专业领域的任务。
After you create the assistant, you can ask it to list all available assistants. Listing the assistants also gives you the IDs required to call the assistant. You can also call other assistants and ask them to complete tasks in their area of specialization.
添加自定义操作非常简单,只需将代码添加到文件并将其放入正确的文件夹即可。playground/assistant_actions从主项目文件夹打开该文件夹,您会看到几个定义各种操作的文件。file_actions.py在 VS Code 中打开该文件,如清单 6.4 所示。
Adding your custom actions is as simple as adding code to a file and dropping it in the right folder. Open the playground/assistant_actions folder from the main project folder, and you’ll see several files that define the various actions. Open the file_actions.py file in VS Code, as shown in listing 6.4.
playground/assistant_actions/file_actions.py 导入操作系统
from playground.actions_manager import agent_action
输出文件夹 = "assistant_outputs"
@agent_action #1
def save_file(filename, content): #2
"""
将内容保存到文件。 #3
:param filename: 文件名(包括扩展名)。
:param content: 要保存到文件中的内容。
"""
file_path = os.path.join(OUTPUT_FOLDER, filename)
with open(file_path, "w", encoding="utf-8") as file:
file.write(内容)
print(f"文件'{filename}'已成功保存。") #4import os
from playground.actions_manager import agent_action
OUTPUT_FOLDER = "assistant_outputs"
@agent_action #1
def save_file(filename, content): #2
"""
Save content to a file. #3
:param filename: The name of the file including extension.
:param content: The content to save in the file.
"""
file_path = os.path.join(OUTPUT_FOLDER, filename)
with open(file_path, "w", encoding="utf-8") as file:
file.write(content)
print(f"File '{filename}' saved successfully.") #4
您可以通过将文件放入assistant_actions文件夹并使用装饰器对其进行装饰,来添加任何您想要的自定义操作agent_action。只需确保为函数指定一个合适的名称,并编写高质量的文档说明其使用方法即可。Playground 启动时,它会加载文件夹中所有已正确装饰且具有描述/文档的操作。
You can add any custom action you want by placing the file in the assistant_actions folder and decorating it with the agent_action decorator. Just make sure to give the function a good name and enter quality documentation for how the function should be used. When the Playground starts up, it loads all the actions in the folder that are decorated correctly and have descriptions/documentation.
就这么简单。您可以根据需要添加多个自定义操作。下一节,我们将介绍一种特殊的自定义操作,它允许助手在本地运行代码。
It’s that simple. You can add several custom actions as needed. In the next section, we’ll look at a special custom action that allows the assistant to run code locally.
要运行本章中的几个示例,您需要安装助手数据库。幸运的是,这可以通过界面轻松完成,只需询问代理即可。接下来的说明详细介绍了助手的安装过程,这些说明直接取自 GPT Assistants Playground 的 README 文件。您可以安装位于assistants.dbSQLite 数据库中的几个演示助手:
To run several of the examples in this chapter, you’ll need to install the assistants database. Fortunately, this can be easily done through the interface and just by asking agents. The upcoming instructions detail the process for installing the assistants and are taken directly from the GPT Assistants Playground README. You can install several of the demo assistants located in the assistants.db SQLite database:
create_manager_assistant操作指令(可在“操作”部分找到)。create_manager_assistant action (found under the Actions section). assistants.db。assistants.db database. 让代理和助手生成并运行可执行代码具有强大的功能。与代码解释器不同,本地运行代码提供了大量快速迭代和调优的机会。我们之前在 AutoGen 中就看到了这一点,代理可以不断运行代码,直到它按预期工作为止。
Getting agents and assistants to generate and run executable code has a lot of power. Unlike the Code Interpreter, running code locally provides numerous opportunities to iterate and tune quickly. We saw this earlier with AutoGen, where the agents could keep running the code until it worked as expected.
在 Playground 中,选择自定义操作非常简单run_code,如图 6.5 所示。您还需要选择该run_shell_command操作,因为它允许助手访问pip install任何所需的模块。
In the Playground, it’s a simple matter to select the custom action run_code, as shown in figure 6.5. You’ll also want to choose the run_shell_command action because it allows the assistant to pip install any required modules.
现在您可以让助手生成并运行代码,以确保其正常运行。您可以尝试添加自定义操作,然后让助手生成并运行代码,如图 6.6 所示。如果代码未按预期运行,请告知助手您遇到的问题。
You can now ask an assistant to generate and run the code to be sure it works on your behalf. Try this out by adding the custom actions and asking the assistant to generate and run code, as shown in figure 6.6. If the code doesn’t work as expected, tell the assistant what problems you encountered.
同样,在 Playground 中运行的 Python 代码会在项目子文件夹中创建一个新的虚拟环境。如果您不运行任何操作系统级代码或底层代码,此系统运行良好。如果您需要更强大的功能,AutoGen 是一个不错的选择,它使用 Docker 容器来运行隔离的代码。
Again, the Python code running in the Playground creates a new virtual environment in a project subfolder. This system works well if you’re not running any operating system–level code or low-level code. If you need something more robust, a good option is AutoGen, which uses Docker containers to run isolated code.
添加操作来运行代码或其他任务可能会让助手感觉像个黑匣子。幸运的是,OpenAI Assistants API 允许你接收事件并查看助手在后台的运行情况。下一节我们将看到它的具体实现方式。
Adding actions to run code or other tasks can make assistants feel like a black box. Fortunately, the OpenAI Assistants API allows you to consume events and see what the assistant is doing behind the scenes. In the next section, we’ll see what this looks like.
OpenAI 在 Assistants API 中添加了一项功能,允许您监听通过工具/操作使用而产生的事件和操作链。此功能已集成到 Playground 中,可在助手调用另一个助手时捕获操作和工具使用情况。
OpenAI added a feature into the Assistants API that allows you to listen to events and actions chained through tool/action use. This feature has been integrated into the Playground, capturing action and tool use when an assistant calls another assistant.
我们可以尝试让助手使用某个工具,然后打开日志。一个很好的例子是,您可以让助手使用代码解释器工具,然后让它绘制一个方程式。图 6.7 展示了这个练习的示例。
We can try this by asking an assistant to use a tool and then open the log. A great example of how you can do this is by giving an assistant the Code Interpreter tool and then asking it to plot an equation. Figure 6.7 shows an example of this exercise.
通常情况下,启用“辅助代码解释器”工具后,您不会看到任何代码生成或执行过程。此功能允许您实时查看辅助程序使用的所有工具和操作。它不仅是出色的诊断工具,还能让您更深入地了解 LLM 的功能。
Usually, when the Assistant Code Interpreter tool is enabled, you don’t see any code generation or execution. This feature allows you to see all tools and actions used by the assistant as they happen. Not only is it an excellent tool for diagnostics, but it also provides additional insights into the functions of LLMs.
由于代码量庞大,而且很可能会经历多次修改,所以我们还没有对代码进行全面审查。即便如此,如果您计划使用 Assistants API,这个项目是一个很好的起点。有了 Playground,我们可以在下一节继续探索 ABT。
We haven’t reviewed the code to do all this because it’s extensive and will likely undergo several changes. That being said, if you plan on working with the Assistants API, this project is a good place to start. With the Playground introduced, we can continue our journey into ABTs in the next section.
智能体行为树(ABT)在助手和智能体系统中实现行为树。ABT 与常规行为树的主要区别在于,前者使用提示来指导动作和条件。由于提示可能会返回大量随机结果,我们也可以将这类树称为随机行为树,这类树确实存在。为简便起见,我们将用于控制智能体的行为树统称为智能体行为树。
Agentic behavior trees (ABTs) implement behavior trees on assistant and agent systems. The key difference between regular behavior trees and ABTs is that they use prompts to direct actions and conditions. Because prompts may return a high occurrence of random results, we could also name these trees stochastic behavior trees, which do exist. For simplicity, we’ll differentiate behavior trees used to control agents, referring to them as agentic.
接下来,我们将进行一个创建 ABT 的练习。最终的树状图将用 Python 编写,但需要设置和配置各种助手。我们将介绍如何使用助手本身来管理它们。
Next, we’ll undertake an exercise to create an ABT. The finished tree will be written in Python but will require the setup and configuration of various assistants. We’ll cover how to manage assistants using the assistants themselves.
幸运的是,Playground 可以帮助我们快速管理和创建助手。我们将首先安装管理助手,然后再安装预定义的助手。让我们按照以下步骤开始安装管理助手:
Fortunately, the Playground can help us quickly manage and create the assistants. We’ll first install the Manager Assistant, followed by installing the predefined assistants. let’s get started with installing the Manager Assistant using the following steps:
create_ manager_assistant操作。无需保存;界面会自动更新助手。create_ manager_assistant action. You don’t need to save; the interface will update the assistant automatically. 经理助理就像一个拥有所有权限的管理员。使用经理助理时,请务必具体说明您的请求。激活经理助理后,您现在可以按照以下步骤安装本书中使用的其他助理:
The Manager Assistant is like an admin that has access to everything. When engaging the Manager Assistant, be sure to be specific about your requests. With the Manager Assistant active, you can now install new assistants used in the book using the following steps:
请列出所有可安装的助手。
Please list all the installable assistants.
请安装Python编码助手。
Please install the Python Coding Assistant.
您可以使用 Playground 管理和安装任何可用的助手。您还可以让管理助手将所有助手的定义保存为 JSON 格式:
You can manage and install any available assistants using the Playground. You can also ask the Manager Assistant to save the definitions of all your assistants as JSON:
请将所有助手信息以 JSON 格式保存到名为 assistants.json 的文件中。
Please save all the assistants as JSON to a file called assistants.json.
经理助手可以执行所有操作,但这些操作应被视为独特且谨慎使用。在设计助手时,最好使其目标明确,并将操作限制在所需范围内。这不仅可以避免让AI做出过多决策,还可以避免因幻觉而导致的意外或错误。
The Manager Assistant can access all actions, which should be considered unique and used sparingly. When crafting assistants, it’s best to keep them goal specific and limit the actions to just what they need. This not only avoids giving the AI too many decisions but also avoids accidents or mistakes caused by hallucinations.
在本章剩余的练习中,您可能需要安装所需的助手。或者,您可以让管理员助手安装所有可用的助手。无论哪种方式,我们都将在下一节中介绍如何创建带有助手的 ABT。
As we go through the remaining exercises in this chapter, you’ll likely need to install the required assistants. Alternatively, you can ask the Manager Assistant to install all available assistants. Either way, we look at creating an ABT with assistants in the next section.
编码挑战为测试和评估智能体和助手系统提供了一个良好的基准。挑战和基准测试可以量化智能体或智能体系统的运行性能。我们在第四章中已经将编码挑战应用于多平台智能体,例如 AutoGen 和 CrewAI。
Coding challenges provide a good baseline for testing and evaluating agent and assistant systems. Challenges and benchmarks can quantify how well an agent or agentic system operates. We already applied coding challenges to multi-platform agents in chapter 4 with AutoGen and CrewAI.
这次的编程挑战,我们将更进一步,选择 Edabit 网站 ( https://edabit.com ) 上的 Python 编程挑战,这些挑战的难度从入门到专家级不等。我们将选择专家级挑战,因为 GPT-4o 和其他模型都是优秀的程序员。请查看下一个挑战列表,并思考你会如何解决它。
For this coding challenge, we’re going a little further and looking at Python coding challenges from the Edabit site (https://edabit.com), which range in complexity from beginner to expert. We’ll stick with the expert code challenges because GPT-4o and other models are excellent coders. Look at the challenge in the next listing, and think about how you would solve it.
AniXDownLoe 的《种草》
你将获得一个表示域 g 的矩阵。
以及两个数字 x、y 坐标。
矩阵中可能存在三种类型的字符:
x 代表一块石头。
o 代表一片泥土空间。
+ 代表一片草地。
你需要模拟草从位置 (x, y) 生长出来。
草可以向四个方向(上、左、右、下)生长。
草只能在泥土地面上生长,不能越过石头。
返回模拟矩阵。
示例
模拟草地([
"xxxxxxx",
"xooooox,
"xxxxoox"
"xoooxxx"
"xxxxxxx"
], 1, 1) → [
"xxxxxxx",
"x+++++x",
"xxxx++x"
"xoooxxx"
"xxxxxxx"
]
笔记
周边总会有岩石Plant the Grass by AniXDownLoe
You will be given a matrix representing a field g
and two numbers x, y coordinate.
There are three types of possible characters in the matrix:
x representing a rock.
o representing a dirt space.
+ representing a grassed space.
You have to simulate grass growing from the position (x, y).
Grass can grow in all four directions (up, left, right, down).
Grass can only grow on dirt spaces and can't go past rocks.
Return the simulated matrix.
Examples
simulate_grass([
"xxxxxxx",
"xooooox",
"xxxxoox"
"xoooxxx"
"xxxxxxx"
], 1, 1) → [
"xxxxxxx",
"x+++++x",
"xxxx++x"
"xoooxxx"
"xxxxxxx"
]
Notes
There will always be rocks on the perimeter
你可以使用任何你想要的挑战或编程练习,但以下几点需要考虑:
You can use any challenge or coding exercise you want, but here are a few things to consider:
除了挑战之外,您还需要一套测试用例或断言来验证解决方案的有效性。在 Edabit 上,挑战通常会提供一套完整的测试用例。以下列表展示了挑战附带的额外测试用例。
Along with any challenge, you’ll also want a set of tests or assertions to confirm the solution works. On Edabit, a challenge typically provides a comprehensive set of tests. The following listing shows the additional tests provided with the challenge.
Test.assert_equals(simulate_grass(
["xxxxxxx","xooooox","xxxxoox","xoooxxx","xxxxxxx"],
1, 1)
["xxxxxxx","x+++++x","xxxx++x","xoooxxx","xxxxxxx"])
Test.assert_equals(simulate_grass(
["xxxxxxx","xoxooox","xxoooox","xooxxxx"
"xoxooox","xoxooox","xxxxxxx"],
2, 3), ["xxxxxxx","xox+++x","xx++++x","x++xxxx","
"x+xooox","x+xooox","xxxxxxx"])
Test.assert_equals(simulate_grass(
["xxxxxx","xoxoox","xxooox","xoooox","xoooox","xxxxxx"],
1, 1)
["xxxxxx","x+xoox","xxooox","xoooox","xoooox","xxxxxx"])
Test.assert_equals(simulate_grass(
["xxxxx","xooox","xooox","xooox","xxxxx"],
1, 1)
["xxxxx","x+++x","x+++x","x+++x","xxxxx"])
Test.assert_equals(simulate_grass(
["xxxxxx","xxxxox","xxooox","xoooxx","xooxxx","
"xooxxx","xxooox","xxxoxx","xxxxxx"],
4, 1)
["xxxxxx","xxxx+x","xx+++x","x+++xx","x++xxx","
"x++xxx","xx+++x","xxx+xx","xxxxxx"])
Test.assert_equals(simulate_grass(
["xxxxxxxxxxx", "xoxooooooox", "xoxoxxxxxox",
“xoxoxoooxox”,“xoxoxoxoxox”,“xoxoxoxoxox”,
"xoxoxxxoxox", "xoxoooooxox", "xoxxxxxxxox",
"xooooooooox", "xxxxxxxxxxx"], 1, 1),
["xxxxxxxxxxx", "x+x+++++++x", "x+x+xxxxx+x",
"x+x+x+++x+x", "x+x+x+x+x+x", "x+x+x+x+x+x",
"x+x+xxx+x+x", "x+x+++++x+x", "x+xxxxxxx+x",
"x+++++++++x", "xxxxxxxxxxx"])Test.assert_equals(simulate_grass(
["xxxxxxx","xooooox","xxxxoox","xoooxxx","xxxxxxx"],
1, 1),
["xxxxxxx","x+++++x","xxxx++x","xoooxxx","xxxxxxx"])
Test.assert_equals(simulate_grass(
["xxxxxxx","xoxooox","xxoooox","xooxxxx",
"xoxooox","xoxooox","xxxxxxx"],
2, 3), ["xxxxxxx","xox+++x","xx++++x","x++xxxx",
"x+xooox","x+xooox","xxxxxxx"])
Test.assert_equals(simulate_grass(
["xxxxxx","xoxoox","xxooox","xoooox","xoooox","xxxxxx"],
1, 1),
["xxxxxx","x+xoox","xxooox","xoooox","xoooox","xxxxxx"])
Test.assert_equals(simulate_grass(
["xxxxx","xooox","xooox","xooox","xxxxx"],
1, 1),
["xxxxx","x+++x","x+++x","x+++x","xxxxx"])
Test.assert_equals(simulate_grass(
["xxxxxx","xxxxox","xxooox","xoooxx","xooxxx",
"xooxxx","xxooox","xxxoxx","xxxxxx"],
4, 1),
["xxxxxx","xxxx+x","xx+++x","x+++xx","x++xxx",
"x++xxx","xx+++x","xxx+xx","xxxxxx"])
Test.assert_equals(simulate_grass(
["xxxxxxxxxxx", "xoxooooooox", "xoxoxxxxxox",
"xoxoxoooxox", "xoxoxoxoxox", "xoxoxoxoxox",
"xoxoxxxoxox", "xoxoooooxox", "xoxxxxxxxox",
"xooooooooox", "xxxxxxxxxxx"], 1, 1),
["xxxxxxxxxxx", "x+x+++++++x", "x+x+xxxxx+x",
"x+x+x+++x+x", "x+x+x+x+x+x", "x+x+x+x+x+x",
"x+x+xxx+x+x", "x+x+++++x+x", "x+xxxxxxx+x",
"x+++++++++x", "xxxxxxxxxxx"])
这些测试将作为两步验证流程的一部分进行,以确认解决方案有效。我们还将按照预先编写的测试和挑战进行操作,以进一步测试人工智能。
The tests will be run as part of a two-step verification to confirm that the solution works. We’ll also use the tests and challenges as written, which will further test the AI.
图 6.8 展示了一个简单的行为树的构成,该行为树将用于解决各种编程挑战。你会注意到,这个 ABT 使用了不同的助手来处理动作和条件。在第一步中,Python 编码助手(称为 Hacker)生成一个解决方案,然后由编码挑战评审员(称为 Judge)进行审查,评审员生成一个改进后的解决方案,最后由另一个 Python 编码助手(称为 Verifier)进行验证。
Figure 6.8 shows the makeup of a straightforward behavior tree that will be used to solve various programming challenges. You’ll notice that this ABT uses a different assistant for the actions and conditions. For the first step, the Python coding assistant (called the Hacker) generates a solution that is then reviewed by the coding challenge Judge (called the Judge), which produces a refined solution that is verified by a different Python coding assistant (called the Verifier).
图 6.8 还展示了每个代理如何在各自的线程中进行对话。助手使用消息线程,类似于 Slack 或 Discord 频道,在同一线程上对话的所有助手都能看到所有消息。对于此 ABT,我们为黑客和裁判保留了一个主对话线程用于共享消息,而验证者则在另一个单独的消息线程中工作。将验证者置于独立的线程中,可以使其免受解决方案求解过程中的干扰。
Figure 6.8 also shows how each agent converses on which thread. Assistants use message threads, similar to a Slack or Discord channel, where all assistants conversing on a thread will see all messages. For this ABT, we keep one main conversation thread for the Hacker and Judge to share messages, while the Verifier works on a separate message thread. Keeping the Verifier on its own thread isolates it from the noise of the solution-solving efforts.
现在,用代码构建 ABT 就是将py_trees软件包和 Playground API 函数结合起来。清单 6.7 展示了一段代码,它创建了每个带有助手的动作/条件节点,并向它们发送了指令。
Now, building the ABT in code is a matter of combining the py_trees package and the Playground API functions. Listing 6.7 shows an excerpt of code that creates each of the action/condition nodes with the assistants and gives them the instructions.
agentic_btree_coding_challenge.py root = py_trees.composites.Sequence("RootSequence", memory=True)
thread = api.create_thread() #1
challenge = textwrap.dedent("""
#2
""")
judge_test_cases = textwrap.dedent("""
#3
""")
hacker = create_assistant_action_on_thread(
线程=线程, #4
action_name="Hacker",
assistant_name="Python 编码助手",
助手说明=textwrap.dedent(f"""
挑战目标:
{挑战} #5
解决挑战并输出结果
最终解决方案保存在名为 solution.py 的文件中。
""),
)
root.add_child(hacker)
judge = create_assistant_action_on_thread(
线程=线程, #6
action_name="评判解决方案"
助理姓名="编程挑战赛评委",
助手说明=textwrap.dedent(
f"""
挑战目标:
{挑战} #7
从文件 solution.py 加载解决方案。
然后确认这是否是解决该挑战的方案。
并使用以下测试用例进行测试:
{judge_test_cases} #8
运行解决方案代码并确认其通过所有测试用例。
如果解决方案通过所有测试,则将解决方案保存到名为“
judged_solution.py
"",
),
)
root.add_child(judge)
# 验证程序在不同的线程上运行,本质上是在一个封闭的房间里运行。
验证器 = 创建助手条件( #9
condition_name="验证解决方案",
assistant_name="Python 编码助手",
助手说明=textwrap.dedent(
f"""
挑战目标:
{挑战} #10
加载名为 judged_solution.py 的文件
运行代码并确认其通过,以此验证解决方案是否正确。
所有测试用例:
{judge_test_cases} #11
如果答案正确,则只返回单词“SUCCESS”,否则返回空字符串。
返回单词 FAILURE。
"",
),
)
root.add_child(verifier)
tree = py_trees.trees.BehaviourTree(root)
当 True 时:
tree.tick()
time.sleep(20) #12
如果 root.status == py_trees.common.Status.SUCCESS: #13
休息
### 所需助理 –
### Python 编程助手和编程挑战赛评测员
### 通过 Playground 安装这些助手root = py_trees.composites.Sequence("RootSequence", memory=True)
thread = api.create_thread() #1
challenge = textwrap.dedent("""
#2
""")
judge_test_cases = textwrap.dedent("""
#3
""")
hacker = create_assistant_action_on_thread(
thread=thread, #4
action_name="Hacker",
assistant_name="Python Coding Assistant",
assistant_instructions=textwrap.dedent(f"""
Challenge goal:
{challenge} #5
Solve the challenge and output the
final solution to a file called solution.py
"""),
)
root.add_child(hacker)
judge = create_assistant_action_on_thread(
thread=thread, #6
action_name="Judge solution",
assistant_name="Coding Challenge Judge",
assistant_instructions=textwrap.dedent(
f"""
Challenge goal:
{challenge} #7
Load the solution from the file solution.py.
Then confirm is a solution to the challenge
and test it with the following test cases:
{judge_test_cases} #8
Run the code for the solution and confirm it passes all the test cases.
If the solution passes all tests save the solution to a file called
judged_solution.py
""",
),
)
root.add_child(judge)
# verifier operates on a different thread, essentially in closed room
verifier = create_assistant_condition( #9
condition_name="Verify solution",
assistant_name="Python Coding Assistant",
assistant_instructions=textwrap.dedent(
f"""
Challenge goal:
{challenge} #10
Load the file called judged_solution.py and
verify that the solution is correct by running the code and confirm it passes
all the test cases:
{judge_test_cases} #11
If the solution is correct, return only the single word SUCCESS, otherwise
return the single word FAILURE.
""",
),
)
root.add_child(verifier)
tree = py_trees.trees.BehaviourTree(root)
while True:
tree.tick()
time.sleep(20) #12
if root.status == py_trees.common.Status.SUCCESS: #13
break
### Required assistants –
### Python Coding Assistant and Coding Challenge Judge
### install these assistants through the Playground
您可以通过在 VS Code 中加载文件或使用命令行来运行 ABT。跟踪终端中的输出,并观察助手如何完成树状结构中的每个步骤。
Run the ABT by loading the file in VS Code or using the command line. Follow the output in the terminal, and watch how the assistants work through each step in the tree.
如果解决方案在条件节点处验证失败,则流程将按照树状结构继续进行。即使是这种简单的解决方案,您也可以快速创建许多变体。您可以向树状结构中添加更多节点/步骤和子树。例如,您可能希望由一个黑客团队来分解和分析挑战。
If the solution fails to be verified at the condition node, the process will continue per the tree. Even with this simple solution, you could quickly create numerous variations. You could extend the tree with more nodes/steps and subtrees. Perhaps you want a team of Hackers to break down and analyze the challenge, for example.
本示例主要使用 Playground 代码及其辅助函数完成create_assistant_condition。create_assistant_action_on_thread该代码利用几个类来集成py_treesPlayground 中封装的行为树代码和 OpenAI Assistants 代码。如果您想了解底层细节,请查看项目中的代码。
This example’s work is done mainly with the Playground code, using the helper functions create_assistant_condition and create_assistant_action_on_thread. This code uses a couple of classes to integrate the py_trees behavior tree code and the OpenAI Assistants code wrapped in the Playground. Review the code within the project if you want to understand the lower-level details.
我们在第四章讨论 AutoGen 时已经接触过对话式多智能体系统。ABT 可以结合对话(通过线程)和其他方法(例如文件共享)来工作。让助手/智能体之间传递文件有助于减少嘈杂和重复的想法/对话。相比之下,对话系统则受益于潜在的涌现行为。因此,结合使用这两种方法有助于更好地控制系统并找到解决方案。
We already looked at conversational multi-agent systems in chapter 4 when we looked at AutoGen. The ABT can work using a combination of conversations (over threads) and other methods, such as file sharing. Having your assistants/agents pass files around helps reduce the number of noisy and repetitive thoughts/conversations. In contrast, conversational systems benefit from potential emergent behaviors. So, using both can help evolve better control and solutions.
清单 6.7 中的简单解决方案可以扩展以应对更多实际的编程挑战,甚至可以作为编程 ABT 使用。下一节中,我们将构建一个不同的 ABT 来处理不同的问题。
The simple solution in listing 6.7 could be extended to handle more real-world coding challenges and perhaps even to work as a coding ABT. In the next section, we build a different ABT to handle a different problem.
在本节的练习中,我们将研究一个能够执行以下操作的ABT:
In this section’s exercise, we look at an ABT that can do the following:
图 6.9 展示了使用不同助手组装的 ABT。在本练习中,我们使用序列节点作为根节点,每个助手执行不同的操作。此外,为了简化操作,每个助手的交互都将在一个新的线程中进行。这使得每个助手的交互都形成一个简洁的对话,如果出现问题,则更容易进行调试。
Figure 6.9 shows the ABT assembled with each of the different assistants. In this exercise, we use a sequence node for the root, and each assistant performs a different action. Also, to keep things simple, each assistant interaction will always occur in a new thread. This isolates each assistant’s interaction into a concise conversation that’s easier to debug if something goes wrong.
如果您计划运行本练习中的代码,则必须将您的 X 凭据添加到.env文件中。该.env.default文件提供了一个凭据示例,如清单 6.8 所示。您无需输入您的凭据。这意味着最后一步(提交)将会失败,但您仍然可以查看该文件(youtube_twitter_post.txt)以了解生成的内容。
If you plan to run the code in this exercise, you must add your X credentials to the .env file. The .env.default file shows an example of how the credentials need to be, as shown in listing 6.8. You don’t have to enter your credentials. This means the last step, posting, will fail, but you can still look at the file (youtube_twitter_post.txt) to see what was generated.
X_EMAIL = "此处填写推特邮箱地址" X_USERNAME = "此处填写推特用户名" X_PASSWORD = "此处填写推特密码"
X_EMAIL = "twitter email here" X_USERNAME = "twitter username here" X_PASSWORD = "twitter password here"
清单 6.9 仅展示了创建助手操作的代码。此 ABT 使用了三个不同的助手,每个助手都有自己的任务指令。请注意,每个助手都有一组独特的指令来定义其角色。您可以使用 Playground 查看每个助手的指令。
Listing 6.9 shows just the code for creating the assistant actions. This ABT uses three different assistants, each with its own task instructions. Note that each assistant has a unique set of instructions defining its role. You can review the instructions for each assistant by using the Playground.
agentic_btree_video_poster_v1.py root = py_trees.composites.Sequence("RootSequence", memory=True)
搜索词 = "GPT 代理"
search_youtube_action = create_assistant_action(
action_name=f"搜索 YouTube({search_term})",
助理名称="YouTube 研究员 v2",
助手说明=f"""
搜索词:{search_term}
使用查询“{search_term}”在 YouTube 上搜索视频。
然后下载每个视频的文字稿并进行总结。
与 {search_term} 的相关性
请务必附上每个视频的链接。
然后将所有摘要保存到名为 youtube_transcripts.txt 的文件中。
如果遇到任何错误,请只返回单词 FAILURE。
"",
)
root.add_child(search_youtube_action)
write_post_action = create_assistant_action(
action_name="写帖子",
助理姓名="Twitter 帖子撰写者",
助手说明=""
加载名为 youtube_transcripts.txt 的文件,
分析内容,查找顶部包含搜索词的内容,并
然后选择
最精彩、最相关的视频:
在推特上发布具有教育意义、娱乐性或信息性的内容。
然后写一条与视频相关的推文,
并附上视频链接。
包含精彩亮点或提及,
并将其保存到名为 youtube_twitter_post.txt 的文件中。
如果遇到任何错误,请只返回单词 FAILURE。
"",
)
root.add_child(write_post_action)
post_action = create_assistant_action(
action_name="Post",
助理名称="社交媒体助理",
助手说明=""
加载名为 youtube_twitter_post.txt 的文件并发布内容
发到推特上。
如果内容为空,请不要发布任何内容。
如果遇到任何错误,请只返回单词 FAILURE。
"",
)
root.add_child(post_action)
### 所需助理 – YouTube 研究员 v2、Twitter 帖子撰写员、
以及社交媒体助手——通过 Playground 安装这些助手。root = py_trees.composites.Sequence("RootSequence", memory=True)
search_term = "GPT Agents"
search_youtube_action = create_assistant_action(
action_name=f"Search YouTube({search_term})",
assistant_name="YouTube Researcher v2",
assistant_instructions=f"""
Search Term: {search_term}
Use the query "{search_term}" to search for videos on YouTube.
then for each video download the transcript and summarize it
for relevance to {search_term}
be sure to include a link to each of the videos,
and then save all summarizations to a file called youtube_transcripts.txt
If you encounter any errors, please return just the word FAILURE.
""",
)
root.add_child(search_youtube_action)
write_post_action = create_assistant_action(
action_name="Write Post",
assistant_name="Twitter Post Writer",
assistant_instructions="""
Load the file called youtube_transcripts.txt,
analyze the contents for references to search term at the top and
then select
the most exciting and relevant video related to:
educational, entertaining, or informative, to post on Twitter.
Then write a Twitter post that is relevant to the video,
and include a link to the video, along
with exciting highlights or mentions,
and save it to a file called youtube_twitter_post.txt.
If you encounter any errors, please return just the word FAILURE.
""",
)
root.add_child(write_post_action)
post_action = create_assistant_action(
action_name="Post",
assistant_name="Social Media Assistant",
assistant_instructions="""
Load the file called youtube_twitter_post.txt and post the content
to Twitter.
If the content is empty please do not post anything.
If you encounter any errors, please return just the word FAILURE.
""",
)
root.add_child(post_action)
### Required assistants – YouTube Researcher v2, Twitter Post Writer,
and Social Media Assistant – install these assistants through the Playground
像往常一样运行代码,几分钟后,assistants_output文件夹中将出现一篇新帖子。图 6.10 显示了使用此 ABT 生成的帖子示例。如果每天运行此 ABT 生成的帖子过多,您的 X 帐户很可能会被封禁。如果您已配置 X 凭据,则会在您的动态中看到该帖子。
Run the code as you normally would, and after a few minutes, a new post will appear in the assistants_output folder. Figure 6.10 shows an example of a post generated using this ABT. Running this ABT to generate more than a few posts a day could, and likely will, get your X account blocked. If you’ve configured X credentials, you’ll see the post appear on your feed.
此ABT仅用于演示目的,不适用于生产环境或长期使用。此演示的主要功能包括:展示搜索和数据加载、摘要和筛选、生成新内容,以及重点介绍多个自定义操作和与API的集成。
This ABT is shown for demonstration purposes and isn’t for production or long-term use. The primary features of this demonstration are to show search and loading data, summarization and filtering, then generating new content, and finally highlighting multiple custom actions and integrations with APIs.
多智能体系统的对话特性可以驱动反馈、推理和涌现行为等机制。正如我们在 YouTube 发布示例中所看到的,使用将助手/智能体隔离的 ABT 来驱动智能体,可以有效地控制结构化流程。然而,我们也不希望错过智能体/助手之间对话带来的益处。
The conversational aspect of multi-agent systems can drive mechanisms such as feedback, reasoning, and emergent behaviors. Driving agents with ABTs that silo assistants/agents can be effective for controlling structured processes, as we saw in the YouTube posting example. However, we also don’t want to miss out on the benefits of conversation across agents/assistants.
幸运的是,Playground 提供了将助手隔离或合并到对话线程中的方法。图 6.11 展示了如何将助手以各种组合方式隔离或合并到线程中。将隔离模式与对话模式相结合,可以兼顾两种模式的优点。
Fortunately, the Playground provides methods to silo or join assistants to conversation threads. Figure 6.11 shows how assistants can be siloed or mixed in various combinations to threads. Combining silos with conversation provides the best of both patterns.
我们将通过一个简单而实用的练习来演示对话模式的有效性。在接下来的练习中,我们将使用两个助手在一个 ABT 中通过同一个线程进行对话。下面的代码清单展示了包含各个助手的树状结构。
We’ll examine a simple but practical exercise to demonstrate the effectiveness of the conversational pattern. For the next exercise, we’ll employ two assistants in an ABT that converse over the same thread. The next listing shows the tree’s construction in code with the respective assistants.
agentic_conversation_btree.py root = py_trees.composites.Sequence("RootSequence", memory=True)
bug_file = """
# 代码未显示
"""
thread = api.create_thread() #1
debug_code = create_assistant_action_on_thread( #2
线程=线程,
action_name="调试代码",
assistant_name="Python 调试器",
助手说明=textwrap.dedent(f"""
以下是存在错误的代码:
{bug_file}
运行代码以识别错误并修复它们。
务必测试代码,确保其运行无错误或抛出异常。
任何例外情况。
""),
)
root.add_child(debug_code)
验证 = create_assistant_condition_on_thread( #3
线程=线程,
condition_name="验证",
assistant_name="Python 编码助手",
助手说明=textwrap.dedent(
"""
验证该解决方案是否修复了错误,并且没有其他问题。
验证代码运行时是否抛出任何异常。
如果解决方案正确,则回复“成功”,否则回复“失败”。
如果您对解决方案满意,请将代码保存到名为“
fixed_bug.py。
"",
),
)
root.add_child(verify)
tree = py_trees.trees.BehaviourTree(root)
当 True 时:
tree.tick()
如果 root.status == py_trees.common.Status.SUCCESS:
休息 #4
time.sleep(20)root = py_trees.composites.Sequence("RootSequence", memory=True)
bug_file = """
# code not shown
"""
thread = api.create_thread() #1
debug_code = create_assistant_action_on_thread( #2
thread=thread,
action_name="Debug code",
assistant_name="Python Debugger",
assistant_instructions=textwrap.dedent(f"""
Here is the code with bugs in it:
{bug_file}
Run the code to identify the bugs and fix them.
Be sure to test the code to ensure it runs without errors or throws
any exceptions.
"""),
)
root.add_child(debug_code)
verify = create_assistant_condition_on_thread( #3
thread=thread,
condition_name="Verify",
assistant_name="Python Coding Assistant",
assistant_instructions=textwrap.dedent(
"""
Verify the solution fixes the bug and there are no more issues.
Verify that no exceptions are thrown when the code is run.
Reply with SUCCESS if the solution is correct, otherwise return FAILURE.
If you are happy with the solution, save the code to a file called
fixed_bug.py.
""",
),
)
root.add_child(verify)
tree = py_trees.trees.BehaviourTree(root)
while True:
tree.tick()
if root.status == py_trees.common.Status.SUCCESS:
break #4
time.sleep(20)
这棵树由三个节点组成:根序列、调试代码操作和验证修复条件。由于树的根节点是一个序列,两个助手会依次执行,直到都成功返回结果。两个助手在同一线程上进行交互,但其控制方式能够提供持续的反馈。
Three nodes comprise the tree: the root sequence, the debug code action, and the verify fix condition. Because the tree’s root is a sequence, the two assistants will continue to work one after another until they both return with success. Both assistants converse on the same thread and yet are controlled in a manner that provides constant feedback.
您可以通过在 VS Code 中加载文件来运行练习,也可以直接从命令行执行。示例代码存在一些小错误和问题,助手会逐一修复。ABT 成功运行后,您可以打开文件assistants_output/fixed_bug.py并验证结果是否正确。
Run the exercise by loading the file in VS Code, or execute it directly from the command line. The example code has a few minor bugs and problems that the assistants will work through to fix. After the ABT completes running successfully, you can open the assistants_output/fixed_bug.py file and verify the results are all good.
我们现在已经了解了一些实际应用的ABT,也理解了使用信息孤岛或对话的细微差别。接下来的部分将教你一些构建自己的ABT的技巧。
We’ve now seen a couple of ABTs in action and understand the nuances of using silos or conversations. The following section will teach you some techniques for building your own ABTs.
反向链接是一种基于逻辑推理的方法,它通过从目标出发反向推导来构建行为树。本节将使用反向链接过程来构建一个能够实现目标的自适应行为树(ABT)。以下列表更详细地描述了该过程:
Back chaining is a method derived from logic and reasoning used to help build behavior trees by working backward from the goal. This section will use the back chaining process to construct an ABT that works to achieve the goal. The following list provides a description of the process in more detail:
行为树通常使用一种称为黑板的模式来实现节点间的通信。黑板(例如在[此处应插入代码示例]中py_trees)使用键值存储来保存信息,并使其能够在节点间访问。它还提供了多种控制功能,例如限制对特定节点的访问。
Behavior trees typically use a pattern called the blackboard to communicate across nodes. Blackboards, like those in py_trees, use a key/value store to save information and make it accessible across nodes. It also provides for several controls, such as limiting access to specific nodes.
我们之所以选择使用文件进行沟通,是因为文件简单透明。但未来智能体系统需要处理的信息量将远超黑板所设计的信息格式,而且格式也会更加多样化。因此,黑板要么需要变得更加复杂,要么需要与文件存储解决方案集成。
We deferred to using files for communication because of their simplicity and transparency. At some point, agentic systems are expected to consume much more information and in different formats than those designed for blackboards. Blackboards must either become more sophisticated or be integrated with file storage solutions.
让我们使用反向链接构建一个ABT。我们可以实现各种目标,但一个有趣且或许更具元意义的目标是构建一个能够帮助构建助手的ABT。因此,我们首先将目标表述为一个语句:“创建一个可以帮助我完成{任务}的助手”:
Let’s build an ABT using back chaining. We could tackle a variety of goals, but one interesting and perhaps meta goal is to build an ABT that helps build assistants. So let’s first present our goal as a statement “Create an assistant that can help me do {task}”:
当然,现在构建这棵树最简单的办法是求助于 ChatGPT 或其他类似的模型。ChatGPT 生成树的结果如下表所示。你也可以独立构建这棵树,并可能引入其他元素。
Of course, the simple solution to building the tree now is to ask ChatGPT or an otherwise capable model. The result of asking ChatGPT to make the tree is shown in the next listing. You could also work the tree out independently and perhaps introduce other elements.
根 │ ├── 序列 │ ├── 操作:向助手提供相关指令以帮助用户 完成给定任务 │ ├── 操作:为助手命名 │ ├── 操作:测试助手 │ ├── 条件:验证助手 │ └── 操作:创建助手
Root │ ├── Sequence │ ├── Action: Give the assistant relevant instructions to help a user with a given task │ ├── Action: Name the assistant │ ├── Action: Test the assistant │ ├── Condition: Verify the assistant │ └── Action: Create the assistant
从这里开始,我们可以遍历每个动作和条件节点,构建决策树,并确定助手需要哪些指令。这其中也包括任何工具和自定义动作,包括你可能需要开发的动作。在第一次遍历时,指令应保持通用性。理想情况下,我们希望创建的助手数量尽可能少。
From this point, we can start building the tree by iterating over each action and condition node and determining what instructions the assistant needs. This can also include any tools and custom actions, including ones you may need to develop. On your first pass, keep the instructions generic. Ideally, we want to create as few assistants as necessary.
确定了每个助手及其工具和操作,以及它们各自负责的任务之后,您可以尝试进一步进行概括。思考哪些操作可以合并,从而减少助手的数量。最好从助手数量不足的情况开始评估,而不是一开始就使用过多的助手。但是,务必确保任务划分合理:例如,测试和验证最好由不同的助手完成。
After determining the assistant, tools, and actions for each assistant and for which task, you can try to generalize things further. Think about where it may be possible to combine actions and reduce the number of assistants. It’s better to start evaluating with insufficient assistants than with too many. However, be sure to maintain the proper divisions of work as tasks: for example, testing and verification are best done with different assistants.
完成以下练习以加深你对相关内容的理解:
Complete the following exercises to improve your knowledge of the material:
目标 ——构建一个智能体行为树(ABT),利用助手来规划旅行行程。
Objective —Build an agentic behavior tree (ABT) to plan a travel itinerary using assistants.
任务:
Tasks:
目标 ——创建一个使用助手自动执行客户支持回复的 ABT。
Objective —Create an ABT that automates customer support responses using assistants.
任务:
Tasks:
目标 ——学习如何使用ABT创建和管理库存水平。
Objective —Learn how to create and manage inventory levels using an ABT.
任务:
Tasks:
目标 ——创建一个使用助手提供个性化健身训练计划的ABT。
Objective —Create an ABT that provides personalized fitness training plans using assistants.
任务:
Tasks:
目标 ——应用反向链接法构建一个提供财务建议和投资策略的ABT。
Objective —Apply back chaining to construct an ABT that provides financial advice and investment strategies.
任务:
Tasks:
在探索了一些关于智能体的基本概念,并研究了如何使用语义内核(SK)等框架,通过工具操作来构建提示和角色之后,我们迈出了构建本书基础的第一步。这个基础就是Nexus,一个旨在易于学习、易于探索且功能强大的智能体平台,足以帮助您构建自己的智能体系统。
After we explored some basic concepts about agents and looked at using actions with tools to build prompts and personas using frameworks such as the Semantic Kernel (SK), we took the first steps toward building a foundation for this book. That foundation is called Nexus, an agent platform designed to be simple to learn, easy to explore, and powerful enough to build your agent systems.
目前市面上已有超过100个用于开发和使用大型语言模型(LLM)应用的AI平台和工具包,从SK或LangChain等工具包到AutoGen和CrewAI等完整平台,应有尽有。这使得选择最适合构建自身AI代理的平台变得颇为困难。
There are more than 100 AI platforms and toolkits for consuming and developing large language model (LLM) applications, ranging from toolkits such as SK or LangChain to complete platforms such as AutoGen and CrewAI. This makes it difficult to decide which platform is well suited to building your own AI agents.
Nexus 是一个开源平台,与本书配套开发,旨在教授构建功能齐全的 AI 代理的核心概念。本章将探讨 Nexus 的构建方式,并介绍代理的两个主要组成部分:用户画像/角色和动作/工具。
Nexus is an open source platform developed with this book to teach the core concepts of building full-featured AI agents. In this chapter, we’ll examine how Nexus is built and introduce two primary agent components: profiles/personas and actions/tools.
图 7.1 展示了 Nexus 的主界面,Nexus 是一款 Streamlit 聊天应用程序,允许用户选择和探索各种代理功能。该界面与 ChatGPT、Gemini 和其他商业 LLM 应用程序类似。
Figure 7.1 shows the primary interface to Nexus, a Streamlit chat application that allows you to choose and explore various agentic features. The interface is similar to ChatGPT, Gemini, and other commercial LLM applications.
除了LLM聊天应用程序的标准功能外,Nexus还允许用户配置代理以使用特定的API/模型、角色以及可执行的操作。本书的其余部分将介绍以下可用的代理选项:
In addition to the standard features of an LLM chat application, Nexus allows the user to configure an agent to use a specific API/model, the persona, and possible actions. In the remainder of the book, the available agent options will include the following:
随着本书的深入,我们将逐步添加 Nexus 以支持新的代理功能。但与此同时,我们仍将力求保持内容的简洁易懂,以便教授许多关键的核心概念。下一节,我们将先介绍如何快速上手 Nexus,然后再深入探讨其各项功能。
As we progress through this book, Nexus will be added to support new agent features. However, simultaneously, the intent will be to keep things relatively simple to teach many of these essential core concepts. In the next section, we’ll look at how to quickly use Nexus before going under the hood to explore features in detail.
Nexus 主要面向各级别开发者,是一个教学平台。因此,它支持多种部署和使用方式。在接下来的练习中,我们将介绍如何快速上手 Nexus。
Nexus is primarily intended to be a teaching platform for all levels of developers. As such, it will support various deployment and usage options. In the next exercise, we’ll introduce how to get up and running with Nexus quickly.
打开终端并创建一个新的 Python 虚拟环境(版本 3.10)。如果您需要创建虚拟环境的帮助,请参阅附录 B。然后,在这个新环境中执行清单 7.1 中所示的命令。您可以在命令行中设置环境变量,也可以创建一个新.env文件并添加设置。
Open a terminal to a new Python virtual environment (version 3.10). If you need assistance creating one, refer to appendix B. Then, execute the commands shown in listing 7.1 within this new environment. You can either set the environment variable at the command line or create a new .env file and add the setting.
pip install git+https://github.com/cxbxmxcx/Nexus.git #1 设置您的 OpenAI API 密钥 export OPENAI_API_KEY=”<您的API密钥>” #2 或者 $env: OPENAI_API_KEY = =”<您的 API 密钥>” #2 或者 echo 'OPENAI_API_KEY="<您的 API 密钥>"' > .env #2 枢纽跑 #3
pip install git+https://github.com/cxbxmxcx/Nexus.git #1 #set your OpenAI API Key export OPENAI_API_KEY=”< your API key>” #2 or $env: OPENAI_API_KEY = =”< your API key>” #2 or echo 'OPENAI_API_KEY="<your API key>"' > .env #2 nexus run #3
输入最后一个命令后,将启动一个网站,并显示登录页面,如图 7.2 所示。请创建一个新用户。未来版本的 Nexus 将允许多个用户参与聊天。
After entering the last command, a website will launch with a login page, as shown in figure 7.2. Go ahead and create a new user. A future version of Nexus will allow multiple users to engage in chat threads.
登录后,您将看到如图 7.1 所示的页面。创建新的聊天并开始与客服人员沟通。如果遇到问题,请确保已正确设置 API 密钥。如下一节所述,您可以使用此方法或从开发工作流程运行 Nexus。
After you log in, you’ll see a page like figure 7.1. Create a new chat and start conversing with an agent. If you encounter a problem, be sure you have the API key set properly. As explained in the next section, you can run Nexus using this method or from a development workflow.
在学习本书练习的过程中,你需要将 Nexus 设置为开发模式。这意味着直接从 GitHub 下载代码库并使用其中的代码。
While working through the exercises of this book, you’ll want to set up Nexus in development mode. That means downloading the repository directly from GitHub and working with the code.
打开一个新的终端窗口,并将工作目录设置为chapter_7源代码文件夹。然后,设置一个新的 Python 虚拟环境(版本 3.10),并输入清单 7.2 中所示的命令。如果您在之前的设置过程中需要任何帮助,请参阅附录 B。
Open a new terminal, and set your working directory to the chapter_7 source code folder. Then, set up a new Python virtual environment (version 3.10) and enter the commands shown in listing 7.2. Again, refer to appendix B if you need assistance with any previous setup.
git clone https://github.com/cxbxmxcx/Nexus.git #1 pip install -e Nexus #2 #设置您的 OpenAI API 密钥(建议使用 .env 文件) export OPENAI_API_KEY=”<您的API密钥>” #bash #3 或者 $env: OPENAI_API_KEY = =”<您的 API 密钥>” #powershell #3 或者 echo 'OPENAI_API_KEY="<您的 API 密钥>"' > .env #3 枢纽跑 #4
git clone https://github.com/cxbxmxcx/Nexus.git #1 pip install -e Nexus #2 #set your OpenAI API Key (.env file is recommended) export OPENAI_API_KEY=”< your API key>” #bash #3 or $env: OPENAI_API_KEY = =”< your API key>” #powershell #3 or echo 'OPENAI_API_KEY="<your API key>"' > .env #3 nexus run #4
图 7.3 显示了“登录”或“创建新用户”屏幕。创建新用户后,应用程序将自动登录。本应用程序使用 Cookie 来记住用户,因此您下次启动应用程序时无需重新登录。如果您的浏览器禁用了 Cookie,则每次都需要登录。
Figure 7.3 shows the Login or Create New User screen. Create a new user, and the application will log you in. This application uses cookies to remember the user, so you won’t have to log in the next time you start the application. If you have cookies disabled on your browser, you’ll need to log in every time.
前往 Nexus 代码库文件夹并浏览一番。图 7.4 展示了应用程序主要元素的架构图。在最顶层,使用 Streamlit 开发的界面通过聊天系统连接系统的其他部分。聊天系统管理着数据库、代理管理器、操作管理器和个人资料管理器。
Go to the Nexus repository folder and look around. Figure 7.4 shows an architecture diagram of the application’s main elements. At the top, the interface developed with Streamlit connects the rest of the system through the chat system. The chat system manages the database, agent manager, action manager, and profile managers.
这个代理平台完全用 Python 编写,Web 界面则使用 Streamlit。下一节,我们将探讨如何构建 OpenAI LLM 聊天应用程序。
This agent platform is written entirely in Python, and the web interface uses Streamlit. In the next section, we look at how to build an OpenAI LLM chat application.
Streamlit 是一款快速而强大的 Web 界面原型设计工具,专为构建机器学习仪表板和概念而设计。它支持完全使用 Python 编写应用程序,并生成基于 React 的现代化 Web 界面。您甚至可以将完成的应用程序快速部署到云端或作为独立应用程序运行。
Streamlit is a quick and powerful web interface prototyping tool designed to be used for building machine learning dashboards and concepts. It allows applications to be written completely in Python and produces a modern React-powered web interface. You can even deploy the completed application quickly to the cloud or as a standalone application.
首先打开 Visual Studio Code (VS Code) 并进入chapter_07源文件夹。如果您已完成之前的练习,应该已经准备就绪。和往常一样,如果您在设置环境和工具方面需要帮助,请参阅附录 B。
Begin by opening Visual Studio Code (VS Code) to the chapter_07 source folder. If you’ve completed the previous exercise, you should already be ready. As always, if you need assistance setting up your environment and tools, refer to appendix B.
我们首先chatgpt_clone_response.py在 VS Code 中打开该文件。代码的顶部部分如清单 7.3 所示。这段代码使用 Streamlit 状态来加载主模型和消息。Streamlit 提供了一种机制来保存任何 Python 对象的会话状态。此状态仅为会话状态,会在用户关闭浏览器时过期。
We’ll start by opening the chatgpt_clone_response.py file in VS Code. The top section of the code is shown in listing 7.3. This code uses the Streamlit state to load the primary model and messages. Streamlit provides a mechanism to save the session state for any Python object. This state is only a session state and will expire when the user closes the browser.
chatgpt_clone_response.py(顶部部分)import streamlit as st
from dotenv import load_dotenv
from openai import OpenAI
load_dotenv() #1
st.title("类似ChatGPT的克隆版")
client = OpenAI() #2
如果“openai_model”不在 st.session_state 中:
st.session_state["openai_model"]
= "gpt-4-1106-preview" #3
如果“messages”不在 st.session_state 中:
st.session_state["messages"] = [] #4
for message in st.session_state["messages"]: #5
使用 st.chat_message(message["role"]):
st.markdown(message["content"])import streamlit as st
from dotenv import load_dotenv
from openai import OpenAI
load_dotenv() #1
st.title("ChatGPT-like clone")
client = OpenAI() #2
if "openai_model" not in st.session_state:
st.session_state["openai_model"]
= "gpt-4-1106-preview" #3
if "messages" not in st.session_state:
st.session_state["messages"] = [] #4
for message in st.session_state["messages"]: #5
with st.chat_message(message["role"]):
st.markdown(message["content"])
Streamlit 应用本身是无状态的。这意味着当网页刷新或用户选择操作时,整个 Python 脚本会重新执行所有界面组件。Streamlit 的状态机制提供了一种临时存储方式。当然,数据库需要支持更长期的存储。
The Streamlit app itself is stateless. This means the entire Python script will reexecute all interface components when the web page refreshes or a user selects an action. The Streamlit state allows for a temporary storage mechanism. Of course, a database needs to support more long-term storage.
用户界面控件和组件通过前缀加上元素名称来添加st.。Streamlit 支持多种标准用户界面控件,并支持图像、视频、音频以及聊天功能。
UI controls and components are added by using the st. prefix and then the element name. Streamlit supports several standard UI controls and supports images, video, sound, and, of course, chat.
继续向下滚动即可看到清单 7.4,其组件布局略微复杂一些。主if语句控制其余代码的运行。通过使用海象运算符(: =),提示符会设置为用户输入的内容。如果用户没有输入任何文本,则该语句下面的代码if不会执行。
Scrolling down further will yield listing 7.4, which has a slightly more complex layout of the components. The main if statement controls the running of the remaining code. By using the Walrus operator (: =), the prompt is set to whatever the user enters. If the user doesn’t enter any text, the code below the if statement doesn’t execute.
chatgpt_clone_response.py(底部部分)if prompt := st.chat_input("您需要什么?"): #1
st.session_state.messages.append({"role": "user", "content": prompt})
使用 st.chat_message("user"): #2
st.markdown(提示)
with st.spinner(text="助手正在思考..."): #3
使用 st.chat_message("assistant"):
response = client.chat.completions.create(
model=st.session_state["openai_model"],
messages=[
{"角色": m["角色"], "内容": m["内容"]}
for m in st.session_state.messages
], #4
)
response_content = response.choices[0].message.content
response = st.markdown(response_content,
unsafe_allow_html=True) #5
st.session_state.messages.append(
{"角色": "助理", "内容": response_content}) #6if prompt := st.chat_input("What do you need?"): #1
st.session_state.messages.append({"role": "user", "content": prompt})
with st.chat_message("user"): #2
st.markdown(prompt)
with st.spinner(text="The assistant is thinking..."): #3
with st.chat_message("assistant"):
response = client.chat.completions.create(
model=st.session_state["openai_model"],
messages=[
{"role": m["role"], "content": m["content"]}
for m in st.session_state.messages
], #4
)
response_content = response.choices[0].message.content
response = st.markdown(response_content,
unsafe_allow_html=True) #5
st.session_state.messages.append(
{"role": "assistant", "content": response_content}) #6
当用户在提示框中输入文本并按下回车键后,该文本会被添加到消息状态中,同时向 API 发出请求。在处理响应期间,控件st.spinner会显示提示信息,提醒用户该过程耗时较长。响应返回后,消息会被显示并添加到消息状态历史记录中。
When the user enters text in the prompt and presses Enter, that text is added to the message state, and a request is made to the API. As the response is being processed, the st.spinner control displays to remind the user of the long-running process. Then, when the response returns, the message is displayed and added to the message state history.
Streamlit 应用使用该模块运行,要调试应用,需要按照以下步骤将调试器附加到该模块:
Streamlit apps are run using the module, and to debug applications, you need to attach the debugger to the module by following these steps:
.vscode/launch.json文件,例如下一个例子所示。IntelliSense 提供了丰富的工具和配置选项,可以指导您完成此文件的设置。.vscode/launch.json file, like the one shown in the next listing. Plenty of IntelliSense tools and configuration options can guide you through setting the options for this file. .vscode/launch.json {
“版本”: “0.2.0,
“配置”:[
{
"name": "Python 调试器:模块", #1
"type": "debugpy",
"请求": "启动",
模块:streamlit, #2
"args": ["run", "${file}"] #3
}
]
}{
"version": "0.2.0",
"configurations": [
{
"name": "Python Debugger: Module", #1
"type": "debugpy",
"request": "launch",
"module": "streamlit", #2
"args": ["run", "${file}"] #3
}
]
}
launch.json文件配置完成后,保存文件,然后chatgpt_ clone_response.py在 VS Code 中打开该文件。现在,您可以按 F5 键以调试模式运行应用程序。这将从终端启动应用程序,几秒钟后,应用程序将显示出来。
After you have the launch.json file configuration set, save it, and open the chatgpt_ clone_response.py file in VS Code. You can now run the application in debug mode by pressing F5. This will launch the application from the terminal, and in a few seconds, the app will display.
图 7.5 显示了应用程序正在运行并等待响应。界面简洁现代,无需任何额外设置即可完成布局。您可以继续使用该界面与 LLM 聊天,然后刷新页面查看结果。
Figure 7.5 shows the app running and waiting to return a response. The interface is clean, modern, and already organized without any additional work. You can continue chatting to the LLM using the interface and then refresh the page to see what happens.
这个演示最令人印象深刻的是创建单页应用程序竟然如此简单。下一节,我们将继续研究这个应用程序,并对其进行一些改进。
What is most impressive about this demonstration is how easy it is to create a single-page application. In the next section, we’ll continue looking at this application but with a few enhancements.
诸如 ChatGPT 和 Gemini 之类的现代聊天应用通过使用流式传输来掩盖其模型的运行速度慢的问题。流式传输使得 API 调用能够立即获取 LLM 生成的令牌。这种流式传输体验还能让用户更好地参与到内容的生成过程中。
Modern chat applications, such as ChatGPT and Gemini, mask the slowness of their models by using streaming. Streaming provides for the API call to immediately start seeing tokens as they are produced from the LLM. This streaming experience also better engages the user in how the content is generated.
为任何应用程序的 UI 添加流媒体支持通常都不是一件容易的事,但幸运的是,Streamlit 提供了一个可以无缝集成的控件。在接下来的练习中,我们将了解如何更新应用程序以支持流媒体播放。
Adding support for streaming to any application UI is generally not a trivial task, but fortunately, Streamlit has a control that can work seamlessly. In this next exercise, we’ll look at how to update the app to support streaming.
在 VS Code 中打开chapter_7/chatgpt_clone_streaming.py。代码的相关更新如清单 7.6 所示。使用此st.write_stream控件可以让 UI 流式传输内容。这也意味着 Python 脚本会阻塞,等待此控件完成。
Open chapter_7/chatgpt_clone_streaming.py in VS Code. The relevant updates to the code are shown in listing 7.6. Using the st.write_stream control allows the UI to stream content. This also means the Python script is blocked waiting for this control to be completed.
chatgpt_clone_streaming.py(相关章节)使用 st.chat_message("assistant"):
stream = client.chat.completions.create(
model=st.session_state["openai_model"],
messages=[
{"角色": m["角色"], "内容": m["内容"]}
for m in st.session_state.messages
],
stream=True, #1
)
response = st.write_stream(stream) #2
st.session_state.messages.append(
{"角色": "助理", "内容": 回复}) #3with st.chat_message("assistant"):
stream = client.chat.completions.create(
model=st.session_state["openai_model"],
messages=[
{"role": m["role"], "content": m["content"]}
for m in st.session_state.messages
],
stream=True, #1
)
response = st.write_stream(stream) #2
st.session_state.messages.append(
{"role": "assistant", "content": response}) #3
按 F5 键并等待页面加载完成,即可调试页面。输入查询语句后,您会看到响应实时显示在窗口中,如图 7.6 所示。加载指示器消失后,用户体验得到提升,响应速度也更快。
Debug the page by pressing F5 and waiting for the page to load. Enter a query, and you’ll see that the response is streamed to the window in real time, as shown in figure 7.6. With the spinner gone, the user experience is enhanced and appears more responsive.
本节演示了使用 Streamlit 创建 Python Web 界面是多么简单。Nexus 使用 Streamlit 界面,因为它易于使用,并且仅使用 Python 即可进行修改。正如您将在下一节中看到的那样,它允许各种配置以支持更复杂的应用程序。
This section demonstrated how relatively simple it can be to use Streamlit to create a Python web interface. Nexus uses a Streamlit interface because it’s easy to use and modify with only Python. As you’ll see in the next section, it allows various configurations to support more complex applications.
Nexus 使用代理配置文件来描述代理的功能和能力。图 7.7 回顾了主要代理组件及其在本书中的结构。
Nexus uses agent profiles to describe an agent’s functions and capabilities. Figure 7.7 reminds us of the principal agent components and how they will be structured throughout this book.
截至撰写本文时,Nexus 仅支持配置文件中的“角色”和“操作”部分。图 7.7 显示了一个名为 Fritz 的配置文件及其角色和操作。要将任何代理配置文件添加到 Nexus,只需将代理 YAML 配置文件复制到相应Nexus/ nexus/nexus_base/nexus_profiles文件夹即可。
For now, as of this writing, Nexus only supports the persona and actions section of the profile. Figure 7.7 shows a profile called Fritz, along with the persona and actions. Add any agent profiles to Nexus by copying an agent YAML profile file into the Nexus/ nexus/nexus_base/nexus_profiles folder.
Nexus 使用插件系统动态发现放置在各自文件夹中的各种组件和配置文件。该nexus_profiles文件夹包含代理的 YAML 定义。
Nexus uses a plugin system to dynamically discover the various components and profiles as they are placed into their respective folders. The nexus_profiles folder holds the YAML definitions for the agent.
我们可以通过在文件夹中创建一个新的 YAML 文件来轻松定义一个新的代理配置文件nexus_ profiles。清单 7.7 展示了一个略微更新了角色信息的新配置文件示例。要跟随操作,请确保已打开 VS Code 并跳转到源代码文件夹,且已在开发者模式下安装 Nexus(参见清单 7.7)。然后,在文件夹中chapter_07创建该文件。fiona.yamlNexus/nexus/nexus_base/nexus_profiles
We can easily define a new agent profile by creating a new YAML file in the nexus_ profiles folder. Listing 7.7 shows an example of a new profile with a slightly updated persona. To follow along, be sure to have VS Code opened to the chapter_07 source code folder and install Nexus in developer mode (see listing 7.7). Then, create the fiona.yaml file in the Nexus/nexus/nexus_base/nexus_profiles folder.
fiona.yaml(创建此文件)代理配置文件: 姓名:菲诺娜 头像:“👹” #1 角色描述:“你是一个非常健谈的人工智能, ↪对食人魔的一切都了如指掌 。你总是用晦涩难懂的食人魔语言回答问题。 #2 操作: - 搜索维基百科 #3 知识:无 #4 内存:空 #4 评估者:null #4 规划者:null #4 反馈:null #4
agentProfile: name: "Finona" avatar: "👹" #1 persona: "You are a very talkative AI that ↪ knows and understands everything in terms of ↪ Ogres. You always answer in cryptic Ogre speak." #2 actions: - search_wikipedia #3 knowledge: null #4 memory: null #4 evaluators: null #4 planners: null #4 feedback: null #4
保存文件后,您可以从命令行启动 Nexus,或者通过在.vscode/launch.json文件夹中创建一个新的启动配置来以调试模式运行它,如下一个示例所示。然后,保存文件并将调试配置切换到使用 Nexus Web 配置。
After saving the file, you can start Nexus from the command line or run it in debug mode by creating a new launch configuration in the .vscode/launch.json folder, as shown in the next listing. Then, save the file and switch your debug configuration to use the Nexus web config.
.vscode/launch.json(添加调试启动){
"name": "Python Debugger: Nexus Web",
"type": "debugpy",
"请求": "启动",
模块:streamlit,
"args": ["run", "Nexus/nexus/streamlit_ui.py"] #1
},{
"name": "Python Debugger: Nexus Web",
"type": "debugpy",
"request": "launch",
"module": "streamlit",
"args": ["run", " Nexus/nexus/streamlit_ui.py"] #1
},
按下 F5 键或从菜单中选择“运行”>“开始调试”,Streamlit Nexus 界面将会启动。请以调试模式运行 Nexus。启动后,创建一个新线程,然后选择标准的 OpenAIAgent 和您新建的角色,如图 7.8 所示。
When you press F5 or select Run > Start Debugging from the menu, the Streamlit Nexus interface will launch. Go ahead and run Nexus in debug mode. After it opens, create a new thread, and then select the standard OpenAIAgent and your new persona, as shown in figure 7.8.
此时,角色配置文件负责定义代理的系统提示。如图 7.8 所示,我们让 Finona 拼写单词clock,她用某种类似食人魔语言的方式回答了。在这个例子中,我们将角色用作个性,但正如我们之前看到的,系统提示也可以包含规则和其他选项。
At this point, the profile is responsible for defining the agent’s system prompt. You can see this in figure 7.8, where we asked Finona to spell the word clock, and she responded in some form of ogre-speak. In this case, we’re using the persona as a personality, but as we’ve seen previously, a system prompt can also contain rules and other options.
用户画像和角色是智能体与用户或其他系统交互方式的基本定义。用户画像的实现需要智能体引擎。下一节我们将介绍智能体引擎的基本实现。
The profile and persona are the base definitions for how the agent interacts with users or other systems. Powering the profile requires an agent engine. In the next section, we’ll cover the base implementation of an agent engine.
Nexus 中的代理引擎为代理提供支持。这些引擎可以与特定的工具平台(例如 SK)以及不同的生命周期模型(LLM,例如 Anthropic Claude 或 Google Gemini)集成。通过提供基础代理抽象,Nexus 应该能够支持现在和将来任何工具或模型。
Agent engines power agents within Nexus. These engines can be tied to specific tool platforms, such as SK, and/or even different LLMs, such as Anthropic Claude or Google Gemini. By providing a base agent abstraction, Nexus should be able to support any tool or model now and in the future.
agent_manager.py目前,Nexus 仅实现了一个基于 OpenAI API 的智能体。我们将通过打开文件夹中的文件来了解基础智能体的定义Nexus/ nexus/nexus_base。
Currently, Nexus only implements an OpenAI API–powered agent. We’ll look at how the base agent is defined by opening the agent_manager.py file from the Nexus/ nexus/nexus_base folder.
清单 7.9 展示了BaseAgent类的功能。创建新的代理引擎时,需要继承此类并使用相应的实现来实现各种工具/操作。
Listing 7.9 shows the BaseAgent class functions. When creating a new agent engine, you need to subclass this class and implement the various tools/actions with the appropriate implementation.
agent_manager.py:BaseAgent 类 BaseAgent:
def __init__(self, chat_history=None):
self._chat_history = chat_history 或 []
self.last_message = ""
self._actions = []
self._profile = None
async def get_response(self,
用户输入,
thread_id=None): #1
raise NotImplementedError("此方法应该实现…")
async def get_semantic_response(self,
迅速的,
thread_id=None): #2
raise NotImplementedError("此方法应该…")
def get_response_stream(self,
用户输入,
thread_id=None): #3
raise NotImplementedError("此方法应该…")
def append_chat_history(self,
thread_id,
用户输入,
回复): #4
self._chat_history.append(
{"角色": "用户",
"内容": 用户输入,
"thread_id": thread_id}
)
self._chat_history.append(
{"role": "机器人",
“内容”:响应,
"thread_id": thread_id}
)
def load_chat_history(self): #5
引发 NotImplementedError(
“应该实施这种方法……”
def load_actions(self): #6
引发 NotImplementedError(
“应该实施这种方法……”
#...未显示 – 属性设置器/获取器class BaseAgent:
def __init__(self, chat_history=None):
self._chat_history = chat_history or []
self.last_message = ""
self._actions = []
self._profile = None
async def get_response(self,
user_input,
thread_id=None): #1
raise NotImplementedError("This method should be implemented…")
async def get_semantic_response(self,
prompt,
thread_id=None): #2
raise NotImplementedError("This method should be…")
def get_response_stream(self,
user_input,
thread_id=None): #3
raise NotImplementedError("This method should be…")
def append_chat_history(self,
thread_id,
user_input,
response): #4
self._chat_history.append(
{"role": "user",
"content": user_input,
"thread_id": thread_id}
)
self._chat_history.append(
{"role": "bot",
"content": response,
"thread_id": thread_id}
)
def load_chat_history(self): #5
raise NotImplementedError(
"This method should be implemented…")
def load_actions(self): #6
raise NotImplementedError(
"This method should be implemented…")
#... not shown – property setters/getters
在 VS Code 中打开文件。清单 7.10 展示了直接调用 OpenAI API 的函数nexus_agents/oai_agent.py的代理引擎实现。` <client>` 是之前在类初始化期间创建的 OpenAI 客户端,其余代码已在之前的示例中使用。get_responseself.client
Open the nexus_agents/oai_agent.py file in VS Code. Listing 7.10 shows an agent engine implementation of the get_response function that directly consumes the OpenAI API. self.client is an OpenAI client created earlier during class initialization, and the rest of the code you’ve seen used in earlier examples.
oai_agent.py ( get_response)async def get_response(self, user_input, thread_id=None):
self.messages += [{"role": "user",
"内容": 用户输入}] #1
response = self.client.chat.completions.create( #2
model=self.model,
messages=self.messages,
温度=0.7, #3
)
self.last_message = str(response.choices[0].message.content)
返回 self.last_message #4async def get_response(self, user_input, thread_id=None):
self.messages += [{"role": "user",
"content": user_input}] #1
response = self.client.chat.completions.create( #2
model=self.model,
messages=self.messages,
temperature=0.7, #3
)
self.last_message = str(response.choices[0].message.content)
return self.last_message #4
与代理配置文件类似,Nexus 使用插件系统,允许您将新的代理引擎定义放置在nexus_agents文件夹中。如果您创建了代理,只需将其放置在此文件夹中,Nexus 即可发现它。
Like the agent profiles, Nexus uses a plugin system that allows you to place new agent engine definitions in the nexus_agents folder. If you create your agent, it just needs to be placed in this folder for Nexus to discover.
我们无需运行示例,因为我们已经了解了 OpenAIAgent 的运行方式。下一节,我们将探讨代理可以开发、添加和使用的代理函数。
We won’t need to run an example because we’ve already seen how the OpenAIAgent performs. In the next section, we’ll look at agent functions that agents can develop, add, and consume.
与 SK 类似,Nexus 也支持原生(代码)函数和语义(提示)函数。但与 SK 不同的是,在 Nexus 中定义和使用函数更加便捷。您只需将函数编写成 Python 文件,然后将其放入nexus_ actions相应的文件夹即可。
Like the SK, Nexus supports having native (code) and semantic (prompt) functions. Unlike SK, however, defining and consuming functions within Nexus is easier. All you need to do is write functions into a Python file and place them into the nexus_ actions folder.
要了解定义函数有多么简单,请打开Nexus/nexus/nexus_base/ nexus_actions文件夹并找到该test_actions.py文件。清单 7.11 展示了两个函数定义。第一个函数是一个简单的代码/原生函数示例,第二个函数是提示/语义函数。
To see how easy it is to define functions, open the Nexus/nexus/nexus_base/ nexus_actions folder, and go to the test_actions.py file. Listing 7.11 shows two function definitions. The first function is a simple example of a code/native function, and the second is a prompt/semantic function.
test_actions.py(原生/语义函数定义)from nexus.nexus_base.action_manager import agent_action
@agent_action #1
def get_current_weather(location, unit="fahrenheit"): #1
获取指定地点的当前天气 #2
返回 f"""
{location} 当前天气为 0 {unit}。
""" #3
@agent_action #4
def recommend(topic):
"""
系统: #5
请针对给定的 {{topic}} 提供建议。
请您根据实际情况提出建议。
用户:
请自行斟酌。
针对 {{topic}} 提供建议。#5
"""
经过 #6from nexus.nexus_base.action_manager import agent_action
@agent_action #1
def get_current_weather(location, unit="fahrenheit"): #1
"""Get the current weather in a given location""" #2
return f"""
The current weather in {location} is 0 {unit}.
""" #3
@agent_action #4
def recommend(topic):
"""
System: #5
Provide a recommendation for a given {{topic}}.
Use your best judgment to provide a recommendation.
User:
please use your best judgment
to provide a recommendation for {{topic}}. #5
"""
pass #6
将这两个函数都放在同nexus_actions一个文件夹中,它们将被自动发现。添加agent_action装饰器后,程序可以检查这些函数,并自动生成 OpenAI 标准工具规范。LLM 随后可以使用此工具规范进行工具使用和函数调用。
Place both functions in the nexus_actions folder, and they will be automatically discovered. Adding the agent_action decorator allows the functions to be inspected and automatically generates the OpenAI standard tool specification. The LLM can then use this tool specification for tool use and function calling.
清单 7.12 显示了为这两个函数生成的 OpenAI 工具规范,如清单 7.11 所示。使用提示的语义函数也适用于工具描述。此工具描述将发送给 LLM,以确定要调用哪个函数。
Listing 7.12 shows the generated OpenAI tool specification for both functions, as shown previously in listing 7.11. The semantic function, which uses a prompt, also applies to the tool description. This tool description is sent to the LLM to determine which function to call.
test_actions:OpenAI 生成的工具规范{
"type": "function",
“功能”: {
"name": "get_current_weather",
“描述”:
“获取指定地点的当前天气情况”, #1
“参数”: {
"type": "object",
“特性”: { #2
“地点”: {
"type": "string",
描述:位置
},
“单元”: {
"type": "string",
"枚举": [
“摄氏度”,
华氏度
]
}
},
“必需的”: [
“地点”
]
}
}
}
{
"type": "function",
“功能”: {
"姓名": "推荐",
“描述”: ”””
系统:
请针对给定的 {{topic}} 提供建议。
请您根据实际情况提出建议。
用户:
请自行斟酌。
为 {{topic}} 提供建议。""", #3
“参数”: {
"type": "object",
“特性”: { #4
“话题”: {
"type": "string",
描述:主题
}
},
“必需的”: [
“话题”
]
}
}
}{
"type": "function",
"function": {
"name": "get_current_weather",
"description":
"Get the current weather in a given location", #1
"parameters": {
"type": "object",
"properties": { #2
"location": {
"type": "string",
"description": "location"
},
"unit": {
"type": "string",
"enum": [
"celsius",
"fahrenheit"
]
}
},
"required": [
"location"
]
}
}
}
{
"type": "function",
"function": {
"name": "recommend",
"description": """
System:
Provide a recommendation for a given {{topic}}.
Use your best judgment to provide a recommendation.
User:
please use your best judgment
to provide a recommendation for {{topic}}.""", #3
"parameters": {
"type": "object",
"properties": { #4
"topic": {
"type": "string",
"description": "topic"
}
},
"required": [
"topic"
]
}
}
}
代理引擎也需要实现这种能力,才能执行各种函数和其他组件。OpenAI 代理已经实现了并行函数调用功能。其他代理引擎的实现则需要支持各自版本的操作使用方式。幸运的是,OpenAI 工具的定义正在逐渐成为标准,许多平台都遵循这一标准。
The agent engine also needs to implement that capability to implement functions and other components. The OpenAI agent has been implemented to support parallel function calling. Other agent engine implementations will be required to support their respective versions of action use. Fortunately, the definition of the OpenAI tool is becoming the standard, and many platforms adhere to this standard.
在深入演示工具使用方法之前,让我们先通过oai_agent.py在 VS Code 中打开文件来观察 OpenAI 代理是如何实现动作的。以下代码清单展示了代理函数的顶部get_response_stream及其函数调用实现。
Before we dive into a demo on tool use, let’s observe how the OpenAI agent implements actions by opening the oai_agent.py file in VS Code. The following listing shows the top of the agent’s get_response_stream function and its implementation of function calling.
get_response_streamdef get_response_stream(self, user_input, thread_id=None):
self.last_message = ""
self.messages += [{"role": "user", "content": user_input}]
如果 self.tools 存在且 len(self.tools) > 0: #1
response = self.client.chat.completions.create(
model=self.model,
messages=self.messages,
tools=self.tools, #2
tool_choice="auto", #3
)
别的: #4
response = self.client.chat.completions.create(
model=self.model,
messages=self.messages,
)
response_message = response.choices[0].message
tool_calls = response_message.tool_calls #5def get_response_stream(self, user_input, thread_id=None):
self.last_message = ""
self.messages += [{"role": "user", "content": user_input}]
if self.tools and len(self.tools) > 0: #1
response = self.client.chat.completions.create(
model=self.model,
messages=self.messages,
tools=self.tools, #2
tool_choice="auto", #3
)
else: #4
response = self.client.chat.completions.create(
model=self.model,
messages=self.messages,
)
response_message = response.choices[0].message
tool_calls = response_message.tool_calls #5
接下来执行函数,如清单 7.14 所示。这段代码演示了智能体如何支持并行函数/工具调用。这些调用之所以并行,是因为智能体会同时执行每个调用,且顺序无关紧要。在第 11 章中,我们将探讨允许按顺序调用动作的规划器。
Executing the functions follows, as shown in listing 7.14. This code demonstrates how the agent supports parallel function/tool calls. These calls are parallel because the agent executes each one together and in no order. In chapter 11, we’ll look at planners that allow actions to be called in ordered sequences.
oai_agent.py(get_response_stream:执行工具调用)如果 tool_calls: #1
可用函数 = {
action["name"]: action["pointer"] for action in self.actions
} #2
self.messages.append(
响应消息
)
对于 tool_calls 中的每个 tool_call: #3
函数名 = tool_call.function.name
要调用的函数 = 可用函数[函数名]
function_args = json.loads(tool_call.function.arguments)
function_response = function_to_call(
**function_args, _caller_agent=self
)
self.messages.append(
{
"tool_call_id": tool_call.id,
"角色": "工具",
"name": function_name,
"content": str(function_response),
}
)
second_response = self.client.chat.completions.create(
model=self.model,
messages=self.messages,
) #4
response_message = second_response.choices[0].messageif tool_calls: #1
available_functions = {
action["name"]: action["pointer"] for action in self.actions
} #2
self.messages.append(
response_message
)
for tool_call in tool_calls: #3
function_name = tool_call.function.name
function_to_call = available_functions[function_name]
function_args = json.loads(tool_call.function.arguments)
function_response = function_to_call(
**function_args, _caller_agent=self
)
self.messages.append(
{
"tool_call_id": tool_call.id,
"role": "tool",
"name": function_name,
"content": str(function_response),
}
)
second_response = self.client.chat.completions.create(
model=self.model,
messages=self.messages,
) #4
response_message = second_response.choices[0].message
为了演示这一点,请按 F5 键在调试器中启动 Nexus。然后,选择两个测试操作——recommend和get_current_weather——以及简洁的角色/配置文件 Olly。图 7.9 显示了输入查询后,代理使用这两个工具进行响应的结果。
To demo this, start up Nexus in the debugger by pressing F5. Then, select the two test actions—recommend and get_current_weather—and the terse persona/profile Olly. Figure 7.9 shows the result of entering a query and the agent responding by using both tools in its response.
如果您需要更详细地了解这些代理操作的工作原理,请参阅第 5 章。底层代码更为复杂,超出了本文的讨论范围。不过,您可以查看 Nexus 代码,以便更好地理解各个部分是如何连接的。
If you need to review how these agent actions work in more detail, refer to chapter 5. The underlying code is more complex and out of the scope of review here. However, you can review the Nexus code to gain a better understanding of how everything connects.
现在,您可以继续在 Nexus 中体验各种代理选项。例如,尝试选择具有其他功能的不同配置文件/角色。在下一章中,我们将揭示代理如何使用检索增强生成 (RAG) 等模式来获取外部记忆和知识。
Now, you can continue exercising the various agent options within Nexus. Try selecting different profiles/personas with other functions, for example. In the next chapter, we unveil how agents can consume external memory and knowledge using patterns such as Retrieval Augmented Generation (RAG).
通过以下练习来加深你对相关内容的理解:
Use the following exercises to improve your knowledge of the material:
目标 ——通过创建一个简单的 Web 应用程序来熟悉 Streamlit,该应用程序显示用户输入的文本。
Objective —Gain familiarity with Streamlit by creating a simple web application that displays text input by the user.
任务:
Tasks:
目标 ——了解在 Nexus 中创建和应用代理配置文件的过程。
Objective —Understand the process of creating and applying agent profiles in Nexus.
任务:
Tasks:
目标 ——学习如何通过开发自定义操作来扩展 Nexus 的功能。
Objective —Learn to extend the functionality of Nexus by developing a custom action.
任务:
Tasks:
fetch_current_news开发一个与模拟 API 集成的新操作(例如),以检索最新的新闻标题。fetch_current_news) that integrates with a mock API to retrieve the latest news headlines. 目标 ——通过集成真正的第三方 API 来增强 Nexus 代理的功能。
Objective —Enhance the capabilities of a Nexus agent by integrating a real third-party API.
任务:
Tasks:
现在我们已经探索了如何使用外部工具(例如原生或语义函数形式的插件)来执行代理操作,接下来我们可以研究记忆和知识在代理和聊天界面检索中的作用。我们将描述记忆和知识的概念,以及它们与提示工程策略的关系。然后,为了理解记忆和知识,我们将研究文档索引,使用 LangChain 构建检索系统,在 LangChain 中使用记忆,并使用 Nexus 构建语义记忆。
Now that we’ve explored agent actions using external tools, such as plugins in the form of native or semantic functions, we can look at the role of memory and knowledge using retrieval in agents and chat interfaces. We’ll describe memory and knowledge and how they relate to prompt engineering strategies, and then, to understand memory knowledge, we’ll investigate document indexing, construct retrieval systems with LangChain, use memory with LangChain, and build semantic memory using Nexus.
在代理和聊天应用程序中,检索是一种获取知识并将其存储在通常为外部且长期保存的存储系统中的机制。非结构化知识包括对话或任务历史记录、事实、偏好或其他用于理解提示语上下文的信息。结构化知识通常存储在数据库或文件中,可通过原生函数或插件访问。
Retrieval in agent and chat applications is a mechanism for obtaining knowledge to keep in storage that is typically external and long-lived. Unstructured knowledge includes conversation or task histories, facts, preferences, or other items necessary for contextualizing a prompt. Structured knowledge, typically stored in databases or files, is accessed through native functions or plugins.
如图 8.1 所示,记忆和知识是用于为提示添加更多上下文和相关信息的要素。提示可以包含各种信息,例如文档信息、之前的任务或对话以及其他参考资料。
Memory and knowledge, as shown in figure 8.1, are elements used to add further context and relevant information to a prompt. Prompts can be augmented with everything from information about a document to previous tasks or conversations and other reference information.
图 8.1 所示的提示工程策略可应用于记忆和知识。知识并非记忆,而是对现有文档提示的扩展。知识和记忆都以检索为基础,用于查询非结构化信息。
The prompt engineering strategies shown in figure 8.1 can be applied to memory and knowledge. Knowledge isn’t considered memory but rather an augmentation of the prompt from existing documents. Both knowledge and memory use retrieval as the basis for how unstructured information can be queried.
这种名为检索增强生成(RAG)的检索机制已成为为提示提供相关上下文的标准方法。RAG 的运作机制与记忆/知识的运作机制相同,因此理解其工作原理至关重要。下一节我们将探讨 RAG 的本质。
The retrieval mechanism, called retrieval augmented generation (RAG), has become a standard for providing relevant context to a prompt. The exact mechanism that powers RAG also powers memory/knowledge, and it’s essential to understand how it works. In the next section, we’ll examine what RAG is.
RAG 已成为支持文档聊天或问答聊天的一种常用机制。该系统的工作原理通常是:用户提供相关文档(例如 PDF),然后使用 RAG 和大型语言模型 (LLM) 来查询该文档。
RAG has become a popular mechanism for supporting document chat or question-and-answer chat. The system typically works by a user supplying a relevant document, such as a PDF, and then using RAG and a large language model (LLM) to query the document.
图 8.2 展示了 RAG 如何允许使用 LLM 查询文档。在查询任何文档之前,必须先将其加载、转换为上下文块、嵌入到向量中,并存储在向量数据库中。
Figure 8.2 shows how RAG can allow a document to be queried using an LLM. Before any document can be queried, it must first be loaded, transformed into context chunks, embedded into vectors, and stored in a vector database.
用户可以通过提交查询来查询之前已索引的文档。该查询随后会被嵌入到向量表示中,以便在向量数据库中搜索相似的数据块。与查询相似的内容会被用作上下文信息,并填充到提示信息中作为增强信息。该提示信息会被推送至逻辑逻辑模型(LLM),LLM 可以利用上下文信息来帮助回答查询。
A user can query previously indexed documents by submitting a query. That query is then embedded into a vector representation to search for similar chunks in the vector database. Content similar to the query is then used as context and populated into the prompt as augmentation. The prompt is pushed to an LLM, which can use the context information to help answer the query.
非结构化记忆/知识概念依赖于某种文本相似性搜索方式,其检索模式如图 8.2 所示。图 8.3 展示了记忆如何使用相同的嵌入和向量数据库组件。与预加载文档不同,对话或对话片段被嵌入并保存到向量数据库中。
Unstructured memory/knowledge concepts rely on some format of text-similarity search following the retrieval pattern shown in figure 8.2. Figure 8.3 shows how memory uses the same embedding and vector database components. Rather than preload documents, conversations or parts of a conversation are embedded and saved to a vector database.
检索模式和文档索引非常复杂,需要仔细考虑才能成功运用。这就要求我们理解数据的存储和检索方式,我们将在下一节中详细阐述。
The retrieval pattern and document indexing are nuanced and require careful consideration to be employed successfully. This requires understanding how data is stored and retrieved, which we’ll start to unfold in the next section.
文档索引是对文档信息进行转换,使其更容易被检索。索引的查询或搜索方式也至关重要,无论是搜索特定的关键词组合,还是逐句匹配。
Document indexing transforms a document’s information to be more easily recovered. How the index will be queried or searched also plays a factor, whether searching for a particular set of words or wanting to match phrase for phrase.
语义搜索是指根据词语及其含义,查找与搜索词组相匹配的内容。这种基于语义的搜索能力非常强大,值得深入研究。下一节,我们将探讨向量相似度搜索如何为语义搜索构建框架。
A semantic search is a search for content that matches the searched phrase by words and meaning. The ability to search by meaning, semantically, is potent and worth investigating in some detail. In the next section, we look at how vector similarity search can lay the framework for semantic search.
现在我们来看看如何将文档转换为语义向量,也就是文本的一种表示形式,这种表示形式可以用来进行距离或相似度匹配。将文本转换为语义向量的方法有很多种,所以我们只介绍一种简单的方法。
Let’s look now at how a document can be transformed into a semantic vector, or a representation of text that can then be used to perform distance or similarity matching. There are numerous ways to convert text into a semantic vector, so we’ll look at a simple one.
chapter_08在新的 Visual Studio Code (VS Code) 工作区中打开该文件夹。创建一个新的环境,并pip install创建requirements.txt包含本章所有依赖项的文件。如果您需要帮助设置新的 Python 环境,请参阅附录 B。
Open the chapter_08 folder in a new Visual Studio Code (VS Code) workspace. Create a new environment and pip install the requirements.txt file for all the chapter dependencies. If you need help setting up a new Python environment, consult appendix B.
现在在 VS Code 中打开该document_vector_similarity.py文件,并查看清单 8.1 中的顶部部分。此示例使用词频-逆文档频率 (TF-IDF)。该数值统计量反映了一个词在文档集合中的重要性,其值与该词在文档中出现的次数成正比,并扣除该词在文档集中的频率。TF-IDF 是衡量单个文档在文档集中重要性的经典方法。
Now open the document_vector_similarity.py file in VS Code, and review the top section in listing 8.1. This example uses Term Frequency–Inverse Document Frequency (TF–IDF). This numerical statistic reflects how important a word is to a document in a collection or set of documents by increasing proportionally to the number of times a word appears in the document and offset by the frequency of the word in the document set. TF–IDF is a classic measure of understanding one document’s importance within a set of documents.
document_vector_similarity(转换为向量)import plotly.graph_objects as go
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
文档 = [ #1
天空湛蓝美丽。
“我太喜欢这片湛蓝美丽的天空了!”
“敏捷的棕色狐狸跳过了懒惰的狗。”
“国王的早餐包括香肠、火腿、培根、鸡蛋、吐司和豆子。”
“我喜欢绿鸡蛋、火腿、香肠和培根!”
“棕色的狐狸动作敏捷,蓝色的狗很懒惰!”
“今天的天空非常蓝,非常美丽。”
“狗很懒,但棕狐很敏捷!”
]
vectorizer = TfidfVectorizer() #2
X = vectorizer.fit_transform(documents) #3import plotly.graph_objects as go
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
documents = [ #1
"The sky is blue and beautiful.",
"Love this blue and beautiful sky!",
"The quick brown fox jumps over the lazy dog.",
"A king's breakfast has sausages, ham, bacon, eggs, toast, and beans",
"I love green eggs, ham, sausages and bacon!",
"The brown fox is quick and the blue dog is lazy!",
"The sky is very blue and the sky is very beautiful today",
"The dog is lazy but the brown fox is quick!"
]
vectorizer = TfidfVectorizer() #2
X = vectorizer.fit_transform(documents) #3
让我们以例句“天空是蓝色的,很美丽”为例,将 TF-IDF 分解成两个组成部分,并重点关注单词“蓝色”。
Let’s break down TF–IDF into its two components using the sample sentence, “The sky is blue and beautiful,” and focusing on the word blue.
词频 (TF) 衡量的是一个词在文档中出现的频率。因为我们只考虑单个文档(即我们的示例句子),所以“蓝色”一词的词频最简单的计算方法是:用“蓝色”在文档中出现的次数除以文档中的总单词数。让我们来计算一下:
Term Frequency measures how frequently a term occurs in a document. Because we’re considering only a single document (our sample sentence), the simplest form of the TF for blue can be calculated as the number of times blue appears in the document divided by the total number of words in the document. Let’s calculate it:
文档中蓝色出现的次数:1
Number of times blue appears in the document: 1
文档总字数:6
Total number of words in the document: 6
TF = 1 ÷ 6TF = 0.16
TF = 1 ÷ 6TF = .16
逆文档频率 (IDF)用于衡量某个词项在整个语料库中的重要性。它的计算方法是用文档总数除以包含该词项的文档数,然后取该商的对数:
Inverse Document Frequency measures how important a term is within the entire corpus. It’s calculated by dividing the total number of documents by the number of documents containing the term and then taking the logarithm of that quotient:
IDF = log(文档总数 ÷ 包含该词的文档数)
IDF = log(Total number of documents ÷ Number of documents containing the word)
在这个例子中,语料库是一个包含八份文档的小集合,蓝色出现在其中四份文档中。
In this example, the corpus is a small collection of eight documents, and blue appears in four of these documents.
IDF = log(8 ÷ 4)
IDF = log(8 ÷ 4)
最后,我们通过将 TF 得分与 IDF 得分相乘来计算示例句子中“蓝色”的TF-IDF 得分:
Finally, the TF–IDF score for blue in our sample sentence is calculated by multiplying the TF and the IDF scores:
TF–IDF = TF × IDF
TF–IDF = TF × IDF
让我们使用提供的示例来计算单词“blue”的 TF-IDF 实际值;首先,词频(单词在文档中出现的频率)的计算方法如下:
Let’s compute the actual values for TF–IDF for the word blue using the example provided; first, the term frequency (how often the word occurs in the document) is computed as follows:
TF = 1 ÷ 6
TF = 1 ÷ 6
假设对数的底数为 10(常用),则逆文档频率的计算方法如下:
Assuming the base of the logarithm is 10 (commonly used), the inverse document frequency is computed as follows:
IDF = log10 (8 ÷ 4)
IDF = log10 (8 ÷ 4)
现在让我们计算一下句子“天空是蓝色的,很美丽”中单词“ blue ”的精确 TF-IDF 值:
Now let’s calculate the exact TF–IDF value for the word blue in the sentence, “The sky is blue and beautiful”:
词频(TF)约为 0.1670。
The Term Frequency (TF) is approximately 0.1670.
逆文档频率(IDF)约为 0.301。
The Inverse Document Frequency (IDF) is approximately 0.301.
因此,蓝色的 TF-IDF (TF × IDF) 得分约为 0.050。
Thus, the TF–IDF (TF × IDF) score for blue is approximately 0.050.
该 TF-IDF 值表示单词“蓝色”在给定文档(示例句子)中,在指定语料库(共八篇文档,其中四篇包含“蓝色” )上下文中的相对重要性。TF-IDF 值越高,表示重要性越大。
This TF–IDF score indicates the relative importance of the word blue in the given document (the sample sentence) within the context of the specified corpus (eight documents, with blue appearing in four of them). Higher TF–IDF scores imply greater importance.
我们在这里使用 TF-IDF 是因为它简单易懂,易于应用。现在我们已经将元素表示为向量,可以使用余弦相似度来衡量文档相似度。余弦相似度是一种用于计算多维空间中两个非零向量之间夹角的余弦值的度量方法,它表示这两个向量的相似程度,而与向量的大小无关。
We use TF–IDF here because it’s simple to apply and understand. Now that we have the elements represented as vectors, we can measure document similarity using cosine similarity. Cosine similarity is a measure used to calculate the cosine of the angle between two nonzero vectors in a multidimensional space, indicating how similar they are, irrespective of their size.
图 8.4 展示了余弦距离如何比较两段文本或两篇文档的向量表示。余弦相似度返回一个介于 -1(完全不相似)到 1(完全相同)之间的值。余弦距离是一个归一化值,范围从 0 到 2,通过 1 减去余弦相似度得到。余弦距离为 0 表示完全相同,而 2 表示完全相反。
Figure 8.4 shows how cosine distance compares the vector representations of two pieces or documents of text. Cosine similarity returns a value from –1 (not similar) to 1 (identical). Cosine distance is a normalized value ranging from 0 to 2, derived by taking 1 minus the cosine similarity. A cosine distance of 0 means identical items, and 2 indicates complete opposites.
清单 8.2 展示了如何使用 scikit-learn 中的函数计算余弦相似度cosine_similarity。该函数计算每个文档与集合中所有其他文档的相似度。计算出的文档相似度矩阵存储在一个cosine_similarities变量中。然后,在输入循环中,用户可以选择文档来查看它与其他文档的相似度。
Listing 8.2 shows how the cosine similarities are computed using the cosine_similarity function from scikit-learn. Similarities are calculated for each document against all other documents in the set. The computed matrix of similarities for documents is stored in the cosine_similarities variable. Then, in the input loop, the user can select the document to view its similarities to the other documents.
document_vector_similarity(余弦相似度)cosine_similarities = cosine_similarity(X) #1
当 True 时: #2
selected_document_index = input(f"请输入文档编号"
↪ (0-{len(documents)-1}) 或 'exit' 退出: ").strip()
如果 selected_document_index.lower() == 'exit':
休息
如果 selected_document_index.isdigit() 为假
↪ not 0 <= int(selected_document_index) < len(documents):
print("输入无效。请输入有效的证件号码。")
继续
selected_document_index = int(selected_document_index) #3
selected_document_similarities = cosine_similarities[selected_document_index] #4
# 省略了绘制文档相似度的代码cosine_similarities = cosine_similarity(X) #1
while True: #2
selected_document_index = input(f"Enter a document number
↪ (0-{len(documents)-1}) or 'exit' to quit: ").strip()
if selected_document_index.lower() == 'exit':
break
if not selected_document_index.isdigit() or
↪ not 0 <= int(selected_document_index) < len(documents):
print("Invalid input. Please enter a valid document number.")
continue
selected_document_index = int(selected_document_index) #3
selected_document_similarities = cosine_similarities[selected_document_index] #4
# code to plot document similarities omitted
图 8.5 显示了在 VS Code 中运行示例的输出结果(按 F5 进入调试模式)。选择文档后,您将看到该集合中各个文档之间的相似度。文档与其自身的余弦相似度为 1。请注意,由于采用了 TF-IDF 向量化,您不会看到负相似度。稍后我们将探讨其他更复杂的语义相似度测量方法。
Figure 8.5 shows the output of running the sample in VS Code (F5 for debugging mode). After you select a document, you’ll see the similarities between the various documents in the set. A document will have a cosine similarity of 1 with itself. Note that you won’t see a negative similarity because of the TF–IDF vectorization. We’ll look later at other, more sophisticated means of measuring semantic similarity.
向量化方法将决定文档间语义相似度的度量。在探讨更优的文档向量化方法之前,我们将先研究如何存储向量以进行向量相似度搜索。
The method of vectorization will dictate the measure of semantic similarity between documents. Before we move on to better methods of vectorizing documents, we’ll examine storing vectors to perform vector similarity searches.
文档向量化后,可以将其存储在向量数据库中,以便后续进行相似性搜索。为了演示其工作原理,我们可以使用 Python 代码高效地复制一个简单的向量数据库。
After vectorizing documents, they can be stored in a vector database for later similarity searches. To demonstrate how this works, we can efficiently replicate a simple vector database in Python code.
如清单 8.3 所示,在 VS Code 中打开此document_vector_database.py代码。该代码演示了如何在内存中创建一个向量数据库,然后允许用户输入文本来搜索该数据库并返回结果。返回的结果会显示文档文本和相似度得分。
Open document_vector_database.py in VS Code, as shown in listing 8.3. This code demonstrates creating a vector database in memory and then allowing users to enter text to search the database and return results. The results returned show the document text and the similarity score.
document_vector_database.py 以上代码已省略
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(documents)
vector_database = X.toarray() #1
def cosine_similarity_search(query,
数据库,
矢量化器,
top_n=5): #2
query_vec = vectorizer.transform([query]).toarray()
相似度 = cosine_similarity(query_vec, database)[0]
top_indices = np.argsort(-similarities)[:top_n] # 前 n 个索引
返回 [(idx, similarities[idx]) for idx in top_indices]
当 True 时: #3
query = input("请输入搜索查询(或输入“exit”停止):")
如果 query.lower() == 'exit':
休息
top_n = int(input("您想查看多少个最佳匹配结果?"))
search_results = cosine_similarity_search(query,
向量数据库,
矢量化器,
top_n)
print("匹配度最高的文档:")
对于 idx,搜索结果中的评分:
print(f"- {documents[idx]} (Score: {score:.4f})") #4
print("\n")
###输出
请输入搜索词(或输入“退出”停止):蓝色
你想看多少场顶级比赛?3场
最匹配的文档:
天空湛蓝美丽。(得分:0.4080)
- 好喜欢这片湛蓝美丽的天空!(得分:0.3439)
棕色的狐狸动作敏捷,蓝色的狗很懒惰!(得分:0.2560)# code above omitted
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(documents)
vector_database = X.toarray() #1
def cosine_similarity_search(query,
database,
vectorizer,
top_n=5): #2
query_vec = vectorizer.transform([query]).toarray()
similarities = cosine_similarity(query_vec, database)[0]
top_indices = np.argsort(-similarities)[:top_n] # Top n indices
return [(idx, similarities[idx]) for idx in top_indices]
while True: #3
query = input("Enter a search query (or 'exit' to stop): ")
if query.lower() == 'exit':
break
top_n = int(input("How many top matches do you want to see? "))
search_results = cosine_similarity_search(query,
vector_database,
vectorizer,
top_n)
print("Top Matched Documents:")
for idx, score in search_results:
print(f"- {documents[idx]} (Score: {score:.4f})") #4
print("\n")
###Output
Enter a search query (or 'exit' to stop): blue
How many top matches do you want to see? 3
Top Matched Documents:
- The sky is blue and beautiful. (Score: 0.4080)
- Love this blue and beautiful sky! (Score: 0.3439)
- The brown fox is quick and the blue dog is lazy! (Score: 0.2560)
运行此练习以查看输出(在 VS Code 中按 F5)。输入任意文本,即可查看返回的文档结果。此搜索表单适用于将单词和短语与相似的单词和短语进行匹配。但这种搜索方式会忽略文档中的词语上下文和含义。下一节中,我们将探讨一种将文档转换为向量的方法,以便更好地保留其语义。
Run this exercise to see the output (F5 in VS Code). Enter any text you like, and see the results of documents being returned. This search form works well for matching words and phrases with similar words and phrases. This form of search misses the word context and meaning from the document. In the next section, we’ll look at a way of transforming documents into vectors that better preserves their semantic meaning.
TF-IDF 是一种简单的词频统计方法,旨在捕捉文档中的语义信息。然而,它并不可靠,因为它只统计词频,而忽略了词语之间的关系。一种更好、更现代的方法是使用文档嵌入,这是一种文档向量化方法,能够更好地保留文档的语义信息。
TF–IDF is a simple form that tries to capture semantic meaning in documents. However, it’s unreliable because it only counts word frequency and doesn’t understand the relationships between words. A better and more modern method uses document embedding, a form of document vectorizing that better preserves the semantic meaning of the document.
嵌入网络是通过在大型数据集上训练神经网络来构建的,它将单词、句子或文档映射到高维向量,从而根据上下文和数据中的关系捕捉语义和句法关系。通常,你会使用在海量数据集上预训练的模型来嵌入文档并执行嵌入操作。许多来源都提供了这样的模型,包括 Hugging Face,当然还有 OpenAI。
Embedding networks are constructed by training neural networks on large datasets to map words, sentences, or documents to high-dimensional vectors, capturing semantic and syntactic relationships based on context and relationships in the data. You typically use a pretrained model trained on massive datasets to embed documents and perform embeddings. Models are available from many sources, including Hugging Face and, of course, OpenAI.
在接下来的场景中,我们将使用 OpenAI 的嵌入模型。这些模型通常非常适合捕捉嵌入文档的语义上下文。清单 8.4 展示了使用 OpenAI 将文档嵌入到向量中的相关代码,然后将这些向量降维到三维空间并渲染成图表。
In our next scenario, we’ll use an OpenAI embedding model. These models are typically perfect for capturing the semantic context of embedded documents. Listing 8.4 shows the relevant code that uses OpenAI to embed the documents into vectors that are then reduced to three dimensions and rendered into a plot.
document_visualizing_embeddings.py(相关章节)load_dotenv() #1
api_key = os.getenv('OPENAI_API_KEY')
如果没有 api_key:
raise ValueError("未找到 API 密钥。请检查您的 .env 文件。")
client = OpenAI(api_key=api_key) #1
def get_embedding(text, model="text-embedding-ada-002"): #2
text = text.replace("\n", " ")
返回 client.embeddings.create(input=[text],
model=model).data[0].embedding #2
# 示例文件(已省略)
embeddings = [get_embedding(doc) for doc in documents] #3
print(embeddings_array.shape)
embeddings_array = np.array(embeddings) #4
pca = PCA(n_components=3) #5
reduced_embeddings = pca.fit_transform(embeddings_array)load_dotenv() #1
api_key = os.getenv('OPENAI_API_KEY')
if not api_key:
raise ValueError("No API key found. Please check your .env file.")
client = OpenAI(api_key=api_key) #1
def get_embedding(text, model="text-embedding-ada-002"): #2
text = text.replace("\n", " ")
return client.embeddings.create(input=[text],
model=model).data[0].embedding #2
# Sample documents (omitted)
embeddings = [get_embedding(doc) for doc in documents] #3
print(embeddings_array.shape)
embeddings_array = np.array(embeddings) #4
pca = PCA(n_components=3) #5
reduced_embeddings = pca.fit_transform(embeddings_array)
当使用 OpenAI 模型嵌入文档时,它会将文本转换为 1536 维的向量。我们无法可视化如此多的维度,因此我们使用通过主成分分析 (PCA) 的降维技术将 1536 维的向量转换为 3 维。
When a document is embedded using an OpenAI model, it transforms the text into a vector with dimensions of 1536. We can’t visualize this number of dimensions, so we use a dimensionality reduction technique via principal component analysis (PCA) to convert the vector of size 1536 to 3 dimensions.
图 8.6 显示了在 VS Code 中运行该文件生成的输出结果。通过将嵌入降维到 3D,我们可以绘制输出图,以展示语义相似的文档是如何分组的。
Figure 8.6 shows the output generated from running the file in VS Code. By reducing the embeddings to 3D, we can plot the output to show how semantically similar documents are now grouped.
选择使用哪种嵌入模型或服务完全取决于您。OpenAI 的嵌入模型被认为是通用语义相似性分析的最佳选择。这使得这些模型成为大多数记忆和检索应用的标准。在理解了如何使用嵌入将文本向量化并存储在向量数据库中之后,我们可以在下一节中讨论一个更实际的例子。
The choice of which embedding model or service you use is up to you. The OpenAI embedding models are considered the best for general semantic similarity. This has made these models the standard for most memory and retrieval applications. With our understanding of how text can be vectorized with embeddings and stored in a vector database, we can move on to a more realistic example in the next section.
我们可以把所有元素结合起来,用一个名为 Chroma DB 的本地矢量数据库来看一个完整的示例。矢量数据库有很多选择,但 Chroma DB 对于开发或小型项目来说是一个优秀的本地矢量存储方案。当然,还有许多功能更强大的选项,您可以稍后再考虑。
We can combine all the pieces and look at a complete example using a local vector database called Chroma DB. Many vector database options exist, but Chroma DB is an excellent local vector store for development or small-scale projects. There are also plenty of more robust options that you can consider later.
清单 8.5 显示了文件中新增的相关代码段document_query_ chromadb.py。请注意,结果按距离而非相似度评分。余弦距离由以下公式计算:
Listing 8.5 shows the new and relevant code sections from the document_query_ chromadb.py file. Note that the results are scored by distance and not by similarity. Cosine distance is determined by this equation:
余弦距离(A,B) = 1 – 余弦相似度(A,B)
Cosine Distance(A,B) = 1 – Cosine Similarity(A,B)
这意味着余弦距离的范围从 0(最相似)到 2(语义完全相反)。
This means that cosine distance will range from 0 for most similar to 2 for semantically opposite in meaning.
document_query_chromadb.py(相关代码章节)embeddings = [get_embedding(doc) for doc in documents] #1
ids = [f"id{i}" for i in range(len(documents))] #1
chroma_client = chromadb.Client() #2
collection = chroma_client.create_collection(
名称="文档") #2
collection.add( #3
嵌入=嵌入,
文档=文档,
ids=ids
)
def query_chromadb(query, top_n=2): #4
查询嵌入 = 获取嵌入(查询)
results = collection.query(
query_embeddings=[query_embedding],
n_results=top_n
)
返回 [(id, score, text) for id, score, text in
zip(results['ids'][0],
results['距离'][0],
results['documents'][0])]
当 True 时: #5
query = input("请输入搜索查询(或输入“exit”停止):")
如果 query.lower() == 'exit':
休息
top_n = int(input("您想查看多少个最佳匹配结果?"))
search_results = query_chromadb(query, top_n)
print("匹配度最高的文档:")
对于搜索结果中的每个 id、score 和 text:
print(f"""
ID:{id} 文本:{text} 分数:{round(score, 2)}
""") #5
print("\n")
###输出
输入搜索词(或输入“退出”停止):狗很懒惰
你想看多少场顶级比赛?3场
最匹配的文档:
ID:id7 文本:狗很懒,但棕狐很敏捷!得分:0.24
ID:id5 文本:棕色狐狸动作敏捷,蓝色小狗懒惰!得分:0.28
ID:id2 文本:敏捷的棕色狐狸跳过了懒惰的狗。得分:0.29embeddings = [get_embedding(doc) for doc in documents] #1
ids = [f"id{i}" for i in range(len(documents))] #1
chroma_client = chromadb.Client() #2
collection = chroma_client.create_collection(
name="documents") #2
collection.add( #3
embeddings=embeddings,
documents=documents,
ids=ids
)
def query_chromadb(query, top_n=2): #4
query_embedding = get_embedding(query)
results = collection.query(
query_embeddings=[query_embedding],
n_results=top_n
)
return [(id, score, text) for id, score, text in
zip(results['ids'][0],
results['distances'][0],
results['documents'][0])]
while True: #5
query = input("Enter a search query (or 'exit' to stop): ")
if query.lower() == 'exit':
break
top_n = int(input("How many top matches do you want to see? "))
search_results = query_chromadb(query, top_n)
print("Top Matched Documents:")
for id, score, text in search_results:
print(f"""
ID:{id} TEXT: {text} SCORE: {round(score, 2)}
""") #5
print("\n")
###Output
Enter a search query (or 'exit' to stop): dogs are lazy
How many top matches do you want to see? 3
Top Matched Documents:
ID:id7 TEXT: The dog is lazy but the brown fox is quick! SCORE: 0.24
ID:id5 TEXT: The brown fox is quick and the blue dog is lazy! SCORE: 0.28
ID:id2 TEXT: The quick brown fox jumps over the lazy dog. SCORE: 0.29
正如前面的场景所示,现在您可以使用语义信息而非仅仅使用关键词或短语来查询文档。这些场景应该能够帮助您理解检索模式的底层工作原理。在下一节中,我们将了解如何使用 LangChain 来实现这种检索模式。
As the earlier scenario demonstrated, you can now query the documents using semantic meaning rather than just key terms or phrases. These scenarios should now provide the background to see how the retrieval pattern works at a low level. In the next section, we’ll see how the retrieval pattern can be employed using LangChain.
LangChain 最初是一个开源项目,专门用于抽象跨多个数据源和向量存储的检索模式。此后,它已经发展成为功能更加强大的工具,但其核心功能仍然为实现检索提供了优秀的方案。
LangChain began as an open source project specializing in abstracting the retrieval pattern across multiple data sources and vector stores. It has since morphed into much more, but foundationally, it still provides excellent options for implementing retrieval.
图 8.7 展示了 LangChain 的一个流程图,该流程图描述了文档存储以供检索的过程。这些步骤可以全部或部分地复制到内存检索中。文档检索和内存检索的关键区别在于数据源以及内容的转换方式。
Figure 8.7 shows a diagram from LangChain that identifies the process of storing documents for retrieval. These same steps may be replicated in whole or in part to implement memory retrieval. The critical difference between document and memory retrieval is the source and how content is transformed.
我们将探讨如何使用 LangChain 实现这些步骤,并了解实现过程中涉及的细微差别和细节。下一节,我们将首先使用 LangChain 来拆分和加载文档。
We’ll examine how to implement each of these steps using LangChain and understand the nuances and details accompanying this implementation. In the next section, we’ll start by splitting and loading documents with LangChain.
检索机制会为给定的提示添加与请求相关的特定信息。例如,您可以请求有关本地文档的详细信息。在早期的语言模型中,由于令牌限制,无法将整个文档作为提示的一部分提交。
Retrieval mechanisms augment the context of a given prompt with specific information relevant to the request. For example, you may request detailed information about a local document. With earlier language models, submitting the whole document as part of the prompt wasn’t an option due to token limitations.
如今,我们可以将整篇文档作为快速请求的一部分提交给许多商业语言学习模型(例如 GPT-4 Turbo)。然而,由于词元数量的增加,结果可能并不理想,而且成本也可能更高。因此,更好的选择是将文档拆分,并使用相关部分来请求上下文——这正是 RAG 和内存模型所做的。
Today, we could submit a whole document for many commercial LLMs, such as GPT-4 Turbo, as part of a prompt request. However, the results may not be better and would likely cost more because of the increased number of tokens. Therefore, a better option is to split the document and use the relevant parts to request context—precisely what RAG and memory do.
将文档拆分为语义上和具体内容相关的部分至关重要。图 8.8 展示了如何拆分包含鹅妈妈童谣的 HTML 文档。通常,将文档拆分为上下文相关的语义块需要仔细考虑。
Splitting a document is essential in breaking down content into semantically and specifically relevant sections. Figure 8.8 shows how to break down an HTML document containing the Mother Goose nursery rhymes. Often, splitting a document into contextual semantic chunks requires careful consideration.
理想情况下,我们将文档分割成块时,应根据相关性和语义进行划分。虽然语言学习模型(LLM)或代理可以帮助我们实现这一点,但我们将重点介绍 LangChain 中现有的文档分割工具包选项。本章稍后将介绍一种语义函数,它可以帮助我们对内容进行语义分割,以便进行词嵌入。
Ideally, when we split documents into chunks, they are broken down by relevance and semantic meaning. While an LLM or agent could help us with this, we’ll look at current toolkit options within LangChain for splitting documents. Later in this chapter, we’ll look at a semantic function that can assist us in semantically dividing content for embeddings.
接下来,请langchain_load_splitting.py在 VS Code 中打开代码,如清单 8.6 所示。这段代码接续上一节清单 8.5 的内容。这次,我们加载的是鹅妈妈童谣,而不是示例文档。
For the next exercise, open langchain_load_splitting.py in VS Code, as shown in listing 8.6. This code shows where we left off from listing 8.5, in the previous section. Instead of using the sample documents, we’re loading the Mother Goose nursery rhymes this time.
langchain_load_splitting.py(章节和输出)从 langchain_community.document_loaders
↪导入 UnstructuredHTMLLoader #1
from langchain.text_splitter import RecursiveCharacterTextSplitter
#之前的代码
加载器 = UnstructuredHTMLLoader(
"sample_documents/mother_goose.xhtml") #2
数据 = loader.load #3
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=100,
chunk_overlap=25, #4
长度函数=len,
add_start_index=True,
)
documents = text_splitter.split_documents(data)
文档 = [doc.page_content
↪ for doc in documents] [100:350] #5
embeddings = [get_embedding(doc) for doc in documents] #6
ids = [f"id{i}" for i in range(len(documents))]
###输出
输入搜索查询(或输入“退出”停止):谁亲吻了女孩们,让
她们哭了?
你想看多少场顶级比赛?3场
最匹配的文档:
ID:id233 文本:并责备了她的女儿,
他吻了我妹妹而不是我。得分:0.4…From langchain_community.document_loaders
↪ import UnstructuredHTMLLoader #1
from langchain.text_splitter import RecursiveCharacterTextSplitter
#previous code
loader = UnstructuredHTMLLoader(
"sample_documents/mother_goose.xhtml") #2
data = loader.load #3
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=100,
chunk_overlap=25, #4
length_function=len,
add_start_index=True,
)
documents = text_splitter.split_documents(data)
documents = [doc.page_content
↪ for doc in documents] [100:350] #5
embeddings = [get_embedding(doc) for doc in documents] #6
ids = [f"id{i}" for i in range(len(documents))]
###Output
Enter a search query (or 'exit' to stop): who kissed the girls and made
them cry?
How many top matches do you want to see? 3
Top Matched Documents:
ID:id233 TEXT: And chid her daughter,
And kissed my sister instead of me. SCORE: 0.4…
请注意清单 8.6 中的 HTML 文档,它被分割成 100 个字符的片段,片段之间有 25 个字符的重叠。这种重叠设计可以避免文档的各个部分截断特定内容。我们选择这种分割器是因为它易于使用、设置和理解。
Note in listing 8.6 that the HTML document gets split into 100-character chunks with a 25-character overlap. The overlap allows the document’s parts not to cut off specific thoughts. We selected the splitter for this exercise because it was easy to use, set up, and understand.
请langchain_load_splitting.py在 VS Code 中运行该文件(按 F5)。输入查询语句,查看结果。清单 8.6 中的输出结果针对特定示例给出了良好的结果。请记住,为了降低成本并缩短练习时间,我们仅嵌入了 250 个文档片段。当然,您也可以尝试嵌入整个文档或使用较小的输入文档示例。
Go ahead and run the langchain_load_splitting.py file in VS Code (F5). Enter a query, and see what results you get. The output in listing 8.6 shows good results given a specific example. Remember that we only embedded 250 document chunks to reduce costs and keep the exercise short. Of course, you can always try to embed the entire document or use a minor input document example.
构建完善的检索系统,文档分割过程或许是最关键的环节。你可以使用多种方法来分割文档,包括同时使用多个方法。多个方法可以分别处理并分割文档,生成同一文档的多个嵌入视图。下一节,我们将探讨一种更通用的文档分割技术——使用词元和词元化。
Perhaps the most critical element to building proper retrieval is the process of document splitting. You can use numerous methods to split a document, including multiple concurrent methods. More than one method passes and splits the document for numerous embedding views of the same document. In the next section, we’ll examine a more general technique for splitting documents, using tokens and tokenization.
分词是将文本分解成词元的过程。词元代表文本中的一个简洁元素,它可以是像“hold”这样的单词,甚至是像左花括号({)这样的符号,具体取决于上下文。
Tokenization is the process of breaking text into word tokens. Where a word token represents a succinct element in the text, a token could be a word like hold or even a symbol like the left curly brace ({), depending on what’s relevant.
使用分词法对文档进行分割,可以为语言模型如何解释文本以及语义相似性分析提供更好的基础。分词法还可以去除无关字符(例如空格),使文档相似性匹配更加精准,从而通常能提供更好的结果。
Splitting documents using tokenization provides a better base for how the text will be interpreted by language models and for semantic similarity. Tokenization also allows the removal of irrelevant characters, such as whitespace, making the similarity matching of documents more relevant and generally providing better results.
接下来,请langchain_token_splitting.py在 VS Code 中打开文件,如清单 8.7 所示。现在,我们使用词法分析来分割文档,将文档分成大小不等的部分。大小不等的原因是原始文档中存在大量空白区域。
For the next code exercise, open the langchain_token_splitting.py file in VS Code, as shown in listing 8.7. Now we split the document using tokenization, which breaks the document into sections of unequal size. The unequal size results from the large sections of whitespace of the original document.
langchain_token_splitting.py(相关新代码)loader = UnstructuredHTMLLoader("sample_documents/mother_goose.xhtml")
数据 = loader.load()
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
数据块大小=50,数据块重叠=10 #1
)
documents = text_splitter.split_documents(data)
documents = [doc for doc in documents][8:94] #2
db = Chroma.from_documents(documents, OpenAIEmbeddings())
def query_documents(query, top_n=2):
docs = db.similarity_search(query, top_n) #3
返回文档
###输出
创建了一个大小为 68 的数据块,
比规定的50更长
创建了一个大小为 67 的数据块,
比规定的50更长 #4
请输入搜索词(或输入“退出”停止):
谁亲吻了女孩们,让她们哭了?
你想看多少场顶级比赛?3场
最匹配的文档:
文件1:乔治·波吉
乔治·波吉、布丁和馅饼
亲吻女孩们,把她们弄哭了。loader = UnstructuredHTMLLoader("sample_documents/mother_goose.xhtml")
data = loader.load()
text_splitter = CharacterTextSplitter.from_tiktoken_encoder(
chunk_size=50, chunk_overlap=10 #1
)
documents = text_splitter.split_documents(data)
documents = [doc for doc in documents][8:94] #2
db = Chroma.from_documents(documents, OpenAIEmbeddings())
def query_documents(query, top_n=2):
docs = db.similarity_search(query, top_n) #3
return docs
###Output
Created a chunk of size 68,
which is longer than the specified 50
Created a chunk of size 67,
which is longer than the specified 50 #4
Enter a search query (or 'exit' to stop):
who kissed the girls and made them cry?
How many top matches do you want to see? 3
Top Matched Documents:
Document 1: GEORGY PORGY
Georgy Porgy, pudding and pie,
Kissed the girls and made them cry.
在 VS Code 中运行langchain_token_splitting.py代码(按 F5)。你可以使用上次的查询语句,也可以使用你自己的查询语句。注意结果比之前的练习明显好得多。但是,由于查询语句使用了几个顺序相同的相似词,结果仍然值得怀疑。
Run the langchain_token_splitting.py code in VS Code (F5). You can use the query we used last time or your own. Notice how the results are significantly better than the previous exercise. However, the results are still suspect because the query uses several similar words in the same order.
更好的测试方法是尝试使用语义相似但用词不同的短语,并检查结果。在代码仍在运行的情况下,输入一个新的查询短语:Why are the girls crying?清单 8.8 显示了执行该查询的结果。如果您自己运行此示例并向下滚动输出,您会看到“Georgy Porgy”出现在返回的第二个或第三个文档中。
A better test would be to try a semantically similar phrase but one that uses different words and check the results. With the code still running, enter a new phrase to query: Why are the girls crying? Listing 8.8 shows the results of executing that query. If you run this example yourself and scroll down over the output, you’ll see Georgy Porgy appear in either the second or third returned document.
请输入搜索查询(或输入“退出”停止):谁让女孩们哭了?
你想看多少场顶级比赛?3场
最匹配的文档:
文件1:威利,威利
威利,威利·威尔金……Enter a search query (or 'exit' to stop): Who made the girls cry?
How many top matches do you want to see? 3
Top Matched Documents:
Document 1: WILLY, WILLY
Willy, Willy Wilkin…
本练习展示了如何运用各种检索方法按语义返回文档。在此基础上,我们可以了解如何将 RAG 应用于知识和记忆系统。下一节将讨论 RAG 在智能体知识和智能体系统中的应用。
This exercise shows how various retrieval methods can be employed to return documents semantically. With this base established, we can see how RAG can be applied to knowledge and memory systems. The following section will discuss RAG as it applies to knowledge of agents and agentic systems.
智能体的知识涵盖了利用红绿灯算法(RAG)对非结构化文档进行语义搜索。这些文档可以是任何类型的文档,从PDF到Microsoft Word文档,以及所有文本,包括代码。智能体的知识还包括利用非结构化文档进行问答、参考查找、信息增强以及其他未来模式的构建。
Knowledge in agents encompasses employing RAG to search semantically across unstructured documents. These documents could be anything from PDFs to Microsoft Word documents and all text, including code. Agentic knowledge also includes using unstructured documents for Q&A, reference lookup, information augmentation, and other future patterns.
Nexus是与本书同步开发的智能体平台,已在前一章中介绍。它为智能体配备了完整的知识和记忆系统。本节将揭示该知识系统的工作原理。
Nexus, the agent platform developed in tandem with this book and introduced in the previous chapter, employs complete knowledge and memory systems for agents. In this section, we’ll uncover how the knowledge system works.
要仅为本章安装 Nexus,请参阅清单 8.9。在chapter_08文件夹内打开终端,并执行清单中的命令,以下载、安装并以普通模式或开发模式运行 Nexus。如果您想参考代码,则应在开发模式下安装项目,并配置调试器以从 VS Code 运行 Streamlit 应用程序。如果您需要复习任何步骤,请参阅第 7 章。
To install Nexus for just this chapter, see listing 8.9. Open a terminal within the chapter_08 folder, and execute the commands in the listing to download, install, and run Nexus in normal or development mode. If you want to refer to the code, you should install the project in development and configure the debugger to run the Streamlit app from VS Code. Refer to chapter 7 if you need a refresher on any of these steps.
# 安装并运行 pip install git+https://github.com/cxbxmxcx/Nexus.git 枢纽跑 # 以开发模式安装 git clone https://github.com/cxbxmxcx/Nexus.git # 以可编辑模式安装克隆的存储库 pip install -e Nexus
# to install and run pip install git+https://github.com/cxbxmxcx/Nexus.git nexus run # install in development mode git clone https://github.com/cxbxmxcx/Nexus.git # Install the cloned repository in editable mode pip install -e Nexus
无论你登录后决定使用哪种方法运行应用程序,请导航至知识库管理器页面,如图 8.9 所示。创建一个新的知识库,然后上传sample_documents/back_to_the_future.txt电影剧本。
Regardless of which method you decide to run the app in after you log in, navigate to the Knowledge Store Manager page, as shown in figure 8.9. Create a new Knowledge Store, and then upload the sample_documents/back_to_the_future.txt movie script.
脚本文件较大,加载、分块和将各个部分嵌入到 Chroma DB 向量数据库中可能需要一些时间。请等待索引完成,然后您可以检查嵌入结果并运行查询,如图 8.10 所示。
The script is a large document, and it may take a while to load, chunk, and embed the parts into the Chroma DB vector database. Wait for the indexing to complete, and then you can inspect the embeddings and run a query, as shown in figure 8.10.
现在,我们可以将知识库连接到支持的代理并提出问题。使用左上角的选择器在 Nexus 界面中选择聊天页面。然后,选择代理和知识库time_travel,如图 8.11 所示。您还需要选择一个支持知识库的代理引擎。每个代理引擎都需要进行正确的配置才能访问。
Now, we can connect the knowledge store to a supported agent and ask questions. Use the top-left selector to choose the chat page within the Nexus interface. Then, select an agent and the time_travel knowledge store, as shown in figure 8.11. You will also need to select an agent engine that supports knowledge. Each of the multiple agent engines requires the proper configuration to be accessible.
目前,截至本章所述,Nexus 一次仅支持访问一个知识库。在未来的版本中,智能体或许能够同时选择多个知识库。这可能包括更高级的选项,例如语义知识以及采用其他形式的 RAG(红绿灯)算法。
Currently, as of this chapter, Nexus supports access to only a single knowledge store at a time. In a future version, agents may be able to select multiple knowledge stores at a time. This may include more advanced options, from semantic knowledge to employing other forms of RAG.
您还可以在知识库管理器页面的“配置”选项卡中配置 RAG 设置,如图 8.12 所示。目前,您可以从拆分器类型(“分块选项”字段)中选择文档分块方式,以及“块大小”字段和“重叠”字段。
You can also configure the RAG settings within the Configuration tab of the Knowledge Store Manager page, as shown in figure 8.12. As of now, you can select from the type of splitter (Chunking Option field) to chunk the document, along with the Chunk Size field and Overlap field.
目前,LangChain 仅支持加载、分割、分块和嵌入这几个基本选项。未来版本的 Nexus 将提供更多选项和模式。支持其他选项的代码可以直接添加到 Nexus 中。
The loading, splitting, chunking, and embedding options provided are the only basic options supported by LangChain for now. In future versions of Nexus, more options and patterns will be offered. The code to support other options can be added directly to Nexus.
我们不会详细介绍执行 RAG 算法的代码,因为它与我们之前讲解过的部分非常相似。您可以查看 Nexus 代码,特别是文件KnowledgeManager中的类knowledge_manager.py。
We won’t cover the code that performs the RAG as it’s very similar to what we already covered. Feel free to review the Nexus code, particularly the KnowledgeManager class in the knowledge_manager.py file.
虽然知识和记忆的检索模式在增强方面非常相似,但两者在填充存储方面却有所不同。下一节,我们将探讨智能体记忆的独特之处。
While the retrieval patterns for knowledge and memory are quite similar for augmentation, the two patterns differ when it comes to populating the stores. In the next section, we’ll explore what makes memory in agents unique.
在智能体和人工智能应用中,记忆通常与认知记忆功能使用相同的术语来描述。认知记忆指的是我们用来记住30秒前做了什么或30年前身高的那种记忆。计算机记忆也是智能体记忆的重要组成部分,但本节暂不讨论。
Memory in agents and AI applications is often described in the same terms as cognitive memory functions. Cognitive memory describes the type of memory we use to remember what we did 30 seconds ago or how tall we were 30 years ago. Computer memory is also an essential element of agent memory, but one we won’t consider in this section.
图 8.13 展示了记忆如何分解为感觉记忆、短期记忆和长期记忆。这种记忆方式可以应用于人工智能体,下表描述了每种记忆形式如何映射到智能体功能:
Figure 8.13 shows how memory is broken down into sensory, short-term, and long-term memory. This memory can be applied to AI agents, and this list describes how each form of memory maps to agent functions:
虽然记忆的检索和增强机制与知识完全相同,但在更新或添加记忆时,两者通常存在显著差异。图 8.14 展示了捕获和使用记忆来增强提示信息的过程。由于记忆的大小通常与完整文档的大小不同,我们可以避免使用任何分割或分块机制。
While memory uses the exact same retrieval and augmentation mechanisms as knowledge, it typically differs significantly when updating or appending memories. Figure 8.14 highlights the process of capturing and using memories to augment prompts. Because memories are often different from the size of complete documents, we can avoid using any splitting or chunking mechanisms.
Nexus 提供了一种类似知识库的机制,允许用户创建可配置用于各种用途和应用的内存存储。它还支持图 8.13 中突出显示的一些更高级的内存形式。下一节将探讨 Nexus 中基本内存存储的工作原理。
Nexus provides a mechanism like the knowledge store, allowing users to create memory stores that can be configured for various uses and applications. It also supports some of the more advanced memory forms highlighted in figure 8.13. The following section will examine how basic memory stores work in Nexus.
在Nexus中,记忆存储的运作和构建方式与知识存储类似。它们都高度依赖于检索模式。不同之处在于,记忆系统在构建新记忆时需要额外的步骤。
Memory stores operate and are constructed like knowledge stores in Nexus. They both heavily rely on the retrieval pattern. What differs is the extra steps memory systems take to build new memories.
启动 Nexus,如果需要安装,请参考清单 8.9。登录后,选择“内存”页面,创建一个新的内存存储,如图 8.15 所示。选择一个代理引擎,然后添加一些个人信息和偏好设置。
Go ahead and start Nexus, and refer to listing 8.9 if you need to install it. After logging in, select the Memory page, and create a new memory store, as shown in figure 8.15. Select an agent engine, and then add a few personal facts and preferences about yourself.
我们需要代理(LLM)的原因已在图 8.14 中展示。当信息被输入到存储器中时,通常会通过 LLM 使用记忆功能进行处理,其目的是将语句/对话处理成与存储器类型相关的语义相关信息。
The reason we need an agent (LLM) was shown in figure 8.14 earlier. When information is fed into a memory store, it’s generally processed through an LLM using a memory function, whose purpose is to process the statements/conversations into semantically relevant information related to the type of memory.
清单 8.10 展示了用于从对话中提取信息并存储到记忆中的对话记忆功能。是的,这只是发送给 LLM 的提示信息的头部部分,指示它如何从对话中提取信息。
Listing 8.10 shows the conversational memory function used to extract information from a conversation into memories. Yes, this is just the header portion of the prompt sent to the LLM, instructing it how to extract information from a conversation.
总结对话内容,并创建一组概括性的陈述。 对话内容。返回一个包含以下键的 JSON 对象:'summary'。 每个键都应该有一个与其相关的语句列表。 类别。仅返回 JSON 对象,不返回其他任何内容。
Summarize the conversation and create a set of statements that summarize the conversation. Return a JSON object with the following keys: 'summary'. Each key should have a list of statements that are relevant to that category. Return only the JSON object and nothing else.
在您生成一些与自身相关的记忆后,返回 Nexus 中的聊天区域,启用my_memory记忆存储,看看智能体对您的了解程度。图 8.16 显示了使用不同智能体引擎的示例对话。
After you generate a few relevant memories about yourself, return to the Chat area in Nexus, enable the my_memory memory store, and see how well the agent knows you. Figure 8.16 shows a sample conversation using a different agent engine.
这是一个基本的记忆模式示例,它从对话中提取事实/偏好,并将它们作为记忆存储在向量数据库中。图 8.13 中展示的记忆模式还有许多其他实现方式。我们将在下一节中实现它们。
This is an example of a basic memory pattern that extracts facts/preferences from conversations and stores them in a vector database as memories. Numerous other implementations of memory follow those displayed earlier in figure 8.13. We’ll implement those in the next section.
心理学家根据记忆内容的不同,将记忆分为多种形式。语义记忆、情景记忆和程序性记忆分别代表不同类型的信息。情景记忆记录的是事件,程序性记忆记录的是过程或步骤,而语义记忆则代表意义,可能包含感觉或情绪。其他形式的记忆(例如地理空间记忆)在此未作详细描述,但未来可能会涉及。
Psychologists categorize memory into multiple forms, depending on what information is remembered. Semantic, episodic, and procedural memory all represent different types of information. Episodic memories are about events, procedural memories are about the process or steps, and semantic represents the meaning and could include feelings or emotions. Other forms of memory (geospatial is another), aren’t described here but could be.
由于这些记忆依赖于额外的分类层级,它们也依赖于另一层语义分类。一些平台,例如语义内核(SK),将其称为语义记忆。这可能会造成混淆,因为语义分类也被用于提取情景记忆和程序性记忆。
Because these memories rely on an additional level of categorization, they also rely on another level of semantic categorization. Some platforms, such as Semantic Kernel (SK), refer to this as semantic memory. This can be confusing because semantic categorization is also applied to extract episodic and procedural memories.
图 8.17 展示了语义记忆分类过程,有时也称为语义记忆。语义记忆与常规记忆的区别在于,语义记忆增加了一个步骤,即对输入进行语义处理,并提取可用于查询记忆相关向量数据库的相关问题。
Figure 8.17 shows the semantic memory categorization process, also sometimes called semantic memory. The difference between semantic memory and regular memory is the additional step of processing the input semantically and extracting relevant questions that can be used to query the memory-relevant vector database.
使用语义增强的优势在于能够提取更多相关的记忆。我们可以通过重新进入 Nexus 并创建一个新的语义记忆存储来验证这一点。
The benefit of using semantic augmentation is the increased ability to extract more relevant memories. We can see this in operation by jumping back into Nexus and creating a new semantic memory store.
图 8.18 展示了如何使用语义记忆配置新的记忆存储。目前,您还无法配置记忆、增强和摘要的具体功能提示。不过,阅读每个功能提示有助于您了解它们的工作原理。
Figure 8.18 shows how to configure a new memory store using semantic memory. As of yet, you can’t configure the specific function prompts for memory, augmentation, and summarization. However, it can be useful to read through each of the function prompts to gain a sense of how they work.
现在,如果您返回并添加事实和偏好,它们将转换为相关记忆类型的语义。图 8.19 展示了同一组语句如何填充到两种不同形式的记忆中。通常,输入到记忆中的语句会更具体地反映记忆的形式。
Now, if you go back and add facts and preferences, they will convert to the semantics of the relevant memory type. Figure 8.19 shows an example of memories being populated for the same set of statements into two different forms of memory. Generally, the statements entered into memory would be more specific to the form of memory.
记忆和知识可以显著帮助智能体应对各种应用场景。事实上,一个单一的记忆/知识存储系统可以为一个或多个智能体提供信息,从而允许对这两种存储系统进行更精细的个性化解读。接下来,我们将讨论记忆/知识压缩,以此结束本章。
Memory and knowledge can significantly assist an agent with various application types. Indeed, a single memory/knowledge store could feed one or multiple agents, allowing for further specialized interpretations of both types of stores. We’ll finish out the chapter by discussing memory/knowledge compression next.
就像我们自身的记忆一样,随着时间的推移,记忆库也会被冗余信息和大量无关细节所淹没。在内部,我们的大脑会通过压缩或概括记忆来处理这些记忆碎片。我们的大脑会记住更重要的细节,以及更频繁地提取的记忆。
Much like our own memory, memory stores can become cluttered with redundant information and numerous unrelated details over time. Internally, our minds deal with memory clutter by compressing or summarizing memories. Our minds remember more significant details over less important ones, and memories accessed more frequently.
我们可以将类似的记忆压缩原理应用于智能体记忆和其他检索系统,以提取重要细节。这种压缩原理类似于语义增强,但它在预先聚类相关记忆的基础上增加了一个层级,使这些记忆能够被整体概括。
We can apply similar principles of memory compression to agent memory and other retrieval systems to extract significant details. The principle of compression is similar to semantic augmentation but adds another layer to the preclusters groups of related memories that can collectively be summarized.
图 8.20 展示了记忆/知识压缩的过程。首先,使用诸如 k-means 之类的算法对记忆或知识进行聚类。然后,将这些记忆组输入压缩函数,该函数将信息汇总并整合为更简洁的表示形式。
Figure 8.20 shows the process of memory/knowledge compression. Memories or knowledge are first clustered using an algorithm such as k-means. Then, the groups of memories are passed through a compression function, which summarizes and collects the items into more succinct representations.
Nexus 使用 k-means 最优聚类算法对知识库和内存存储进行压缩。图 8.21 显示了内存压缩界面。在该界面中,您可以看到以 3D 形式显示并按聚类方式排列的项目。左侧表格显示了聚类的大小(项目数量)。
Nexus provides for both knowledge and memory store compression using k-means optimal clustering. Figure 8.21 shows the compression interface for memory. Within the compression interface, you’ll see the items displayed in 3D and clustered. The size (number of items) of the clusters is shown in the left table.
如果集群中的项目数量庞大或分布不均,通常建议压缩内存甚至知识。压缩的具体用例可能因内存的用途和应用而异。但一般来说,如果对存储中的项目进行检查发现重复或冗余信息,则适合进行压缩。以下总结了一些可以从压缩中获益的应用用例。
Compressing memories and even knowledge is generally recommended if the number of items in a cluster is large or unbalanced. Each use case for compression may vary depending on the use and application of memories. Generally, though, if an inspection of the items in a store contains repetitive or duplicate information, it’s a good time for compression. The following is a summary of use cases for applications that would benefit from compression.
知识检索和增强已被证明能显著受益于压缩。具体效果会因应用场景而异,但一般来说,知识源越冗长,压缩带来的益处就越大。包含文学散文的文档,例如故事和小说,比代码库更能从中获益。然而,如果代码本身也高度重复,压缩同样可能带来益处。
Knowledge retrieval and augmentation have also been shown to benefit significantly from compression. Results will vary by use case, but generally, the more verbose the source of knowledge, the more it will benefit from compression. Documents that feature literary prose, such as stories and novels, will benefit more than, say, a base of code. However, if the code is likewise very repetitive, compression could also be shown to be beneficial.
内存通常会受益于定期压缩,而知识库通常只在首次加载时起作用。压缩的频率很大程度上取决于内存的使用情况、频率和容量。
Memory will often benefit from the periodic compression application, whereas knowledge stores typically only help on the first load. How frequently you apply compression will greatly depend on the memory use, frequency, and quantity.
同时进行多次压缩已被证明可以提高检索性能。其他模式也表明,可以利用不同压缩级别的内存或知识。例如,知识库被压缩两次,从而得到三个不同级别的知识。
Multiple passes of compression at the same time has been shown to improve retrieval performance. Other patterns have also suggested using memory or knowledge at various levels of compression. For example, a knowledge store is compressed two times, resulting in three different levels of knowledge.
如果一个系统专门针对特定的知识来源,并且该系统还使用了内存,那么可以通过进一步优化来整合存储。另一种方法是直接将文档的初始知识填充到内存中。
If a system is specialized to a particular source of knowledge and that system also employs memories, there may be further optimization to consolidate stores. Another approach is to populate memory with the starting knowledge of a document directly.
在更高级的系统中,我们将探讨智能体如何运用与其工作流程相关的多个记忆和知识存储。例如,智能体可以在与不同用户对话时使用独立的记忆存储,甚至可能具备与不同用户群体共享不同记忆组的能力。记忆和知识检索是智能体系统的基石,现在我们可以总结一下前面讨论的内容,并在下一节回顾一些练习。
In more advanced systems, we’ll look at agents employing multiple memory and knowledge stores relevant to their workflow. For example, an agent could employ individual memory stores as part of its conversations with individual users, perhaps including the ability to share different groups of memory with different groups of individuals. Memory and knowledge retrieval are cornerstones of agentic systems, and we can now summarize what we covered and review some learning exercises in the next section.
通过以下练习来加深你对相关内容的理解:
Use the following exercises to improve your knowledge of the material:
目标 ——利用 LangChain 了解文档拆分对检索效率的影响。
Objective —Understand the effect of document splitting on retrieval efficiency by using LangChain.
任务:
Tasks:
目标 ——通过执行语义搜索来比较各种向量化技术的有效性。
Objective —Compare the effectiveness of various vectorization techniques by performing semantic searches.
任务:
Tasks:
目标 ——运用 LangChain 在实际环境中应用 RAG 的理论知识。
Objective —Apply theoretical knowledge of RAG in a practical context using LangChain.
任务:
Tasks:
目标 ——了解不同的拆分模式和压缩方式如何影响知识检索。
Objective —Understand how different splitting patterns and compression affect knowledge retrieval.
任务:
Tasks:
目标 ——了解不同内存存储类型的独特性和应用案例。
Objective —Understand the uniqueness and use cases of different memory store types.
任务:
Tasks:
本章我们将深入探讨“系统性测试变更”提示工程策略。如果您还记得,我们在第二章中介绍了 OpenAI 提示工程框架的总体策略。这些策略对于构建更好的提示,进而构建更完善的智能体配置文件和角色至关重要。理解这一策略对于我们的提示工程之旅至关重要。
In this chapter, we delve into the Test Changes Systematically prompt engineering strategy. If you recall, we covered the grand strategies of the OpenAI prompt engineering framework in chapter 2. These strategies are instrumental in helping us build better prompts and, consequently, better agent profiles and personas. Understanding this role is key to our prompt engineering journey.
系统性地测试变更是一项核心要素,微软为此开发了一款名为“提示流” (Prompt Flow)的工具,本章稍后将对此进行详细介绍。在了解提示流之前,我们需要理解为什么需要系统性的提示工程。
Test Changes Systematically is such a core facet of prompt engineering that Microsoft developed a tool around this strategy called prompt flow, described later in this chapter. Before getting to prompt flow, we need to understand why we need systemic prompt engineering.
提示设计本质上是一个迭代过程。在构建提示时,您通常需要反复迭代和评估。为了更好地理解这一概念,不妨考虑将提示设计简单地应用于 ChatGPT 问题。
Prompt engineering, by its nature, is an iterative process. When building a prompt, you’ll often iterate and evaluate. To see this concept in action, consider the simple application of prompt engineering to a ChatGPT question.
您可以打开浏览器访问 ChatGPT(https://chat.openai.com/),在 ChatGPT 中输入以下(文本)提示,然后点击“发送消息”按钮(图 9.1 左侧显示了此对话的示例):
You can follow along by opening your browser to ChatGPT (https://chat.openai.com/), entering the following (text) prompt into ChatGPT, and clicking the Send Message button (an example of this conversation is shown in figure 9.1, on the left side):
你能推荐一些什么吗?
can you recommend something
我们可以看到 ChatGPT 的响应是要求提供更多信息。请打开与 ChatGPT 的新对话,并在右侧输入如图 9.1 所示的提示信息:
We can see that the response from ChatGPT is asking for more information. Go ahead and open a new conversation with ChatGPT, and enter the following prompt, as shown in figure 9.1, on the right side:
请问您能否推荐一部以中世纪为背景的穿越时空电影?
Can you please recommend a time travel movie set in the medieval period.
图 9.1 的结果清晰地展示了省略细节和更具体地提出请求之间的区别。我们只是运用了礼貌地撰写清晰指令的策略,ChatGPT 就为我们提供了很好的建议。但同时也要注意 ChatGPT 本身是如何引导用户进行更有效的提示的。图 9.2 中更新后的屏幕展示了 OpenAI 的提示工程策略。
The results in figure 9.1 show a clear difference between leaving out details and being more specific in your request. We just applied the tactic of politely Writing Clear Instructions, and ChatGPT provided us with a good recommendation. But also notice how ChatGPT itself guides the user into better prompting. The refreshed screen shown in figure 9.2 shows the OpenAI prompt engineering strategies.
我们刚刚运用简单的迭代方法改进了提示。我们可以通过使用系统提示/消息来扩展这个例子。图 9.3 展示了系统提示在迭代通信中的用法和作用。在第二章中,我们在多个示例中使用了系统消息/提示。
We just applied simple iteration to improve our prompt. We can extend this example by using a system prompt/message. Figure 9.3 demonstrates the use and role of the system prompt in iterative communication. In chapter 2, we used the system message/prompt in various examples.
您也可以在 ChatGPT 中尝试此操作。这次,请输入以下提示,并在提示中包含小写的单词system,然后换行(在消息窗口中输入换行符,不要按 Shift-Enter 发送消息):
You can also try this in ChatGPT. This time, enter the following prompt and include the word system in lowercase, followed by a new line (enter a new line in the message window without sending the message by pressing Shift-Enter):
系统
system
你是穿越时空电影方面的专家。
You are an expert on time travel movies.
ChatGPT 会回复一些友好的评论,如图 9.4 所示。因此,它乐于接受新的角色,并询问是否有任何后续问题。现在,请像之前一样输入以下通用提示:
ChatGPT will respond with some pleasant comments, as shown in figure 9.4. Because of this, it’s happy to accept its new role and asks for any follow-up questions. Now enter the following generic prompt as we did previously:
你能推荐一些什么吗?
can you recommend something
我们刚刚看到了对提示信息进行迭代优化的过程,也就是提示信息工程,目的是提取更好的回复。这项工作是在使用 ChatGPT 用户界面进行的三次不同对话中完成的。虽然这不是最有效的方式,但它确实有效。
We’ve just seen the iteration of refining a prompt, the prompt engineering, to extract a better response. This was accomplished over three different conversations using the ChatGPT UI. While not the most efficient way, it works.
然而,我们尚未定义评估提示效果并确定提示何时有效的迭代流程。图 9.5 展示了一种利用迭代和评估系统进行提示工程的系统方法。
However, we haven’t defined the iterative flow for evaluating the prompt and determining when a prompt is effective. Figure 9.5 shows a systemic method of prompt engineering using a system of iteration and evaluation.
迭代和评估提示的系统涵盖了“系统性测试变更”策略的总体框架。评估提示的性能和有效性仍然是一个新兴领域,但我们将借鉴教育领域的一些技巧,例如评分标准和基础教学,这些内容将在本章后续章节中进行探讨。然而,正如下一节所述,在此之前,我们需要理解角色和代理配置文件之间的区别。
The system of iterating and evaluating prompts covers the broad Test Changes Systemically strategy. Evaluating the performance and effectiveness of prompts is still new, but we’ll use techniques from education, such as rubrics and grounding, which we’ll explore in a later section of this chapter. However, as spelled out in the next section, we need to understand the difference between a persona and an agent profile before we do so.
代理配置文件是对描述代理的组件提示或消息的封装。它包括代理的角色、特殊指令以及其他可以指导用户或其他代理使用者的策略。
An agent profile is an encapsulation of component prompts or messages that describe an agent. It includes the agent’s persona, special instructions, and other strategies that can guide the user or other agent consumers.
图 9.6 展示了智能体配置文件的主要要素。这些要素与本书中描述的提示工程策略相对应。并非所有智能体都会使用完整智能体配置文件中的所有要素。
Figure 9.6 shows the main elements of an agent profile. These elements map to prompt engineering strategies described in this book. Not all agents will use all the elements of a full agent profile.
从根本上讲,智能体配置文件是一组描述智能体的提示信息。它可能包含与行动/工具、知识、记忆、推理、评估、规划和反馈相关的其他外部元素。这些元素的组合构成了完整的智能体提示配置文件。
At a basic level, an agent profile is a set of prompts describing the agent. It may include other external elements related to actions/tools, knowledge, memory, reasoning, evaluation, planning, and feedback. The combination of these elements comprises an entire agent prompt profile.
提示是智能体功能的核心。一个或一组提示驱动着智能体配置文件中的每个组件。对于操作/工具而言,这些提示定义明确,但正如我们所见,记忆和知识的提示会因使用场景的不同而存在显著差异。
Prompts are the heart of an agent’s function. A prompt or set of prompts drives each of the agent components in the profile. For actions/tools, these prompts are well defined, but as we’ve seen, prompts for memory and knowledge can vary significantly by use case.
AI代理配置文件的定义不仅仅是系统提示。提示流不仅允许我们构建构成代理配置文件的提示和代码,还能评估其有效性。下一节,我们将介绍提示流并开始使用它。
The definition of an AI agent profile is more than just a system prompt. Prompt flow can allow us to construct the prompts and code comprising the agent profile but also include the ability to evaluate its effectiveness. In the next section, we’ll open up prompt flow and start using it.
Prompt Flow 是微软在其 Azure 机器学习工作室平台中开发的一款工具。该工具后来在 GitHub 上以开源项目的形式发布,并因此获得了更多关注和使用。虽然最初是作为应用程序平台开发的,但它在开发和评估提示/配置文件方面也展现出了强大的实力。
Prompt flow is a tool developed by Microsoft within its Azure Machine Learning Studio platform. The tool was later released as an open source project on GitHub, where it has attracted more attention and use. While initially intended as an application platform, it has since shown its strength in developing and evaluating prompts/ profiles.
由于 Prompt Flow 最初是为在 Azure 服务上运行而开发的,因此它具有强大的核心架构。该工具支持多线程批处理,使其成为大规模评估提示的理想选择。以下部分将介绍 Prompt Flow 的入门基础知识。
Because prompt flow was initially developed to run on Azure as a service, it features a robust core architecture. The tool supports multi-threaded batch processing, which makes it ideal for evaluating prompts at scale. The following section will examine the basics of starting with prompt flow.
在开始本书练习之前,需要完成一些先决条件。本节和本章的相关先决条件如下表所示;请务必在尝试练习之前完成这些条件:
There are a few prerequisites to undertake before working through the exercises in this book. The relevant prerequisites for this section and chapter are shown in the following list; make sure to complete them before attempting the exercises:
pip install如下操作:pip install, as shown here: pip install promptflow promptflow-tools
pip install promptflow promptflow-tools
打开 VS Code 并进入本书的源代码文件夹chapter 3。确保已连接虚拟环境,并已安装提示流包和扩展。
Open up VS Code to the book’s source code folder, chapter 3. Ensure that you have a virtual environment connected and have installed the prompt flow packages and extension.
首先,您需要在提示流扩展中创建与 LLM 资源的连接。在 VS Code 中打开提示流扩展,然后单击以打开连接。接下来,单击 LLM 资源旁边的加号以创建新连接,如图 9.7 所示。
First, you’ll want to create a connection to your LLM resource within the prompt flow extension. Open the prompt flow extension within VS Code, and then click to open the connections. Then, click the plus sign beside the LLM resource to create a new connection, as shown in figure 9.7.
这将打开一个 YAML 文件,您需要在其中填写连接名称和其他与连接相关的信息。请按照说明操作,并且不要将 API 密钥输入到文档中,如图 9.8 所示。
This will open a YAML file where you’ll need to populate the connection name and other information relevant to your connection. Follow the directions, and don’t enter API keys into the document, as shown in figure 9.8.
输入连接信息后,点击文档底部的“创建连接”链接。这将在文档下方打开一个终端提示符,要求您输入密钥。根据您的终端配置,您可能无法粘贴(Ctrl+V 或 Cmd+V)。或者,您可以将鼠标光标悬停在终端上,然后在 Windows 窗口中单击鼠标右键来粘贴密钥。
When the connection information is entered, click the Create Connection link at the bottom of the document. This will open a terminal prompt below the document, asking you to enter your key. Depending on your terminal configuration, you may be unable to paste (Ctrl-V, Cmd-V). Alternatively, you can paste the key by hovering the mouse cursor over the terminal and right-clicking on Windows.
现在我们来测试连接,首先打开chapter_09/promptflow/simpleflow文件夹中的简单流程文件。然后,在 VS Code 中打开该flow.dag.yaml文件。这是一个 YAML 文件,但提示流程扩展提供了一个可视化编辑器,可以通过单击文件顶部的“可视化编辑器”链接来访问,如图 9.9 所示。
We’ll now test the connection by first opening the simple flow in the chapter_09/promptflow/simpleflow folder. Then, open the flow.dag.yaml file in VS Code. This is a YAML file, but the prompt flow extension provides a visual editor that is accessible by clicking the Visual Editor link at the top of the file, as shown in figure 9.9.
打开可视化编辑器窗口后,您将看到一个表示流程和流程模块的图表。双击推荐模块,并设置连接名称、API 类型和模型或部署名称,如图 9.10 所示。
After the visual editor window is opened, you’ll see a graph representing the flow and the flow blocks. Double-click the recommender block, and set the connection name, API type, and model or deployment name, as shown in figure 9.10.
提示流程由一系列代码块组成,以一个Inputs代码块开始,以另一个代码块结束Outputs。在这个简单的流程中,recommender代码块代表 LLM 连接以及用于与模型对话的提示。echo在这个简单示例中,代码块会回显输入。
A prompt flow is composed of a set of blocks starting with an Inputs block and terminating in an Outputs block. Within this simple flow, the recommender block represents the LLM connection and the prompt used to converse with the model. The echo block for this simple example echoes the input.
在创建与 LLM 的连接时(无论是通过提示流还是 API),我们始终需要考虑以下关键参数(提示流文档:https://microsoft.github.io/promptflow):
When creating a connection to an LLM, either in prompt flow or through an API, here are the crucial parameters we always need to consider (prompt flow documentation: https://microsoft.github.io/promptflow):
chat聊天补全 API(例如 GPT-4)或completion较旧的补全模型(例如 OpenAI Davinci)。chat for a chat completion API, such as GPT-4, or completion for the older completion models, such as the OpenAI Davinci. 1表示响应的变异性较高,而较低的温度值则0表示希望响应的变异性为零。这是一个需要理解的关键参数,正如我们将看到的,它会因应用场景而异。1 represents a high variability of responses, while 0 indicates a desire for no variability. This is a critical parameter to understand and, as we’ll see, will vary by use case. 配置 LLM 模块后,向上滚动到“输入”模块部分,并查看 user_input 字段中显示的主要输入,如图 9.11 所示。将其保留为默认值,然后单击窗口顶部的“播放”按钮。
After configuring the LLM block, scroll up to the Inputs block section, and review the primary input shown in the user_input field, as shown in figure 9.11. Leave it as the default, and then click the Play button at the top of the window.
流程中的所有模块都会运行,结果将显示在终端窗口中。您可能会发现,输出结果会推荐一些关于时间旅行的电影。这是因为推荐模块已经设置了一个简单的用户画像,我们将在下一节中了解它的工作原理。
All the blocks in the flow will run, and the results will be shown in the terminal window. What you should find interesting is that the output shows recommendations for time travel movies. This is because the recommender block already has a simple profile set, and we’ll see how that works in the next section.
该流程会根据其使用的提示或配置文件,提供穿越时空电影的推荐。默认情况下,提示流程使用 Jinja2 模板来定义提示的内容,也就是我们所说的配置文件。为了本书以及我们对人工智能代理的探索,我们将这些模板称为流程或代理的配置文件。
The flow responds with time travel movie recommendations because of the prompt or profile it uses. By default, prompt flow uses Jinja2 templates to define the content of the prompt or what we’ll call a profile. For the purposes of this book and our exploration of AI agents, we’ll refer to these templates as the profile of a flow or agent.
虽然 Prompt Flow 没有明确地将自己定义为助手或代理引擎,但它无疑符合生成代理和通用类型代理的标准。正如您将看到的,Prompt Flow 甚至支持将流程部署到容器中以及作为服务部署。
While prompt flow doesn’t explicitly refer to itself as an assistant or agent engine, it certainly meets the criteria of producing a proxy and general types of agents. As you’ll see, prompt flow even supports deployments of flows into containers and as services.
打开 VS Code chapter_09/promptflow/simpleflow/flow.dag.yaml,并在可视化编辑器中打开该文件。然后,找到“提示”字段,并单击recommended .jinja2链接,如图 9.12 所示。
Open VS Code to chapter_09/promptflow/simpleflow/flow.dag.yaml, and open the file in the visual editor. Then, locate the Prompt field, and click the recommended .jinja2 link, as shown in figure 9.12.
Jinja 是一个模板引擎,Jinja2 是该引擎的一个特定版本。模板是定义任何形式文本文档的布局和组成部分的绝佳方式。它们已被广泛用于生成 HTML、JSON、CSS 和其他文档格式。此外,它们还支持直接在模板中应用代码。虽然构建提示或代理配置文件没有标准方法,但本书推荐使用 Jinja 等模板引擎。
Jinja is a templating engine, and Jinja2 is a particular version of that engine. Templates are an excellent way of defining the layout and parts of any form of text document. They have been extensively used to produce HTML, JSON, CSS, and other document forms. In addition, they support the ability to apply code directly into the template. While there is no standard way to construct prompts or agent profiles, our preference in this book is to use templating engines such as Jinja.
此时,请在recommended.jinja2模板的系统提示中更改角色。然后,在可视化编辑器中打开流程并单击“运行”按钮,运行流程的所有模块。下一节将介绍运行提示流程以进行测试或实际部署的其他方法。
At this point, change the role within the system prompt of the recommended.jinja2 template. Then, run all blocks of the flow by opening the flow in the visual editor and clicking the Play button. The next section will look at other ways of running prompt flow for testing or actual deployment.
由于 Prompt Flow 也被设计为一种服务,因此它支持多种快速部署方式,可以部署为应用程序或 API。Prompt Flow 可以部署为从终端运行的本地 Web 应用程序和 API,也可以部署为 Docker 容器。
Because prompt flow was also designed to be deployed as a service, it supports a couple of ways to deploy as an app or API quickly. Prompt flow can be deployed as a local web application and API running from the terminal or as a Docker container.
从 VS Code 的可视化编辑器返回到该flow.dag.yaml文件。在窗口顶部,“运行”按钮旁边有几个选项,我们需要进一步了解。单击图 9.13 所示的“生成”按钮,然后选择部署为本地应用程序。系统将创建一个新的 YAML 文件来配置应用程序。保留默认设置,然后单击“启动本地应用程序”链接。
Return to the flow.dag.yaml file in the visual editor from VS Code. At the top of the window beside the Play button are several options we’ll want to investigate further. Click the Build button as shown in figure 9.13, and then select to deploy as a local app. A new YAML file will be created to configure the app. Leave the defaults, and click the Start Local App link.
这将以本地 Web 应用程序的形式启动流程,您会看到一个打开的浏览器标签页,如图 9.14 所示。在标有红色星号的必填字段 user_input 中输入一些文本。点击 Enter 键,等待几秒钟以获取回复。
This will launch the flow as a local web application, and you’ll see a browser tab open, as shown in figure 9.14. Enter some text into the user_input field, which is marked as required with a red asterisk. Click Enter and wait a few seconds for the reply.
你应该会看到类似图 9.12 中所示的回复,其中流程或代理会回复一个时间旅行电影列表。这很棒——我们刚刚开发了第一个代理配置文件,相当于一个代理。但是,我们需要确定这些推荐的效果或价值。下一节,我们将探讨如何评估提示和配置文件。
You should see a reply like the one shown earlier in figure 9.12, where the flow or agent replies with a list of time travel movies. This is great—we’ve just developed our first agent profile and the equivalent of a proxy agent. However, we need to determine how successful or valuable the recommendations are. In the next section, we explore how to evaluate prompts and profiles.
任何提示或代理配置文件的关键要素在于其完成给定任务的有效性。正如我们在推荐示例中所看到的,提示代理配置文件提供推荐列表相对容易,但要了解这些推荐是否有用,则需要我们评估其响应。
A key element of any prompt or agent profile is how well it performs its given task. As we see in our recommendation example, prompting an agent profile to give a list of recommendations is relatively easy, but knowing whether those recommendations are helpful requires us to evaluate the response.
幸运的是,提示流的设计初衷就是为了大规模评估提示/配置文件。其强大的基础架构支持将 LLM 交互的评估并行化,并将其作为工作进程进行管理,从而能够快速完成数百个配置文件评估及其变体。
Fortunately, prompt flow has been designed to evaluate prompts/profiles at scale. The robust infrastructure allows for the evaluation of LLM interactions to be parallelized and managed as workers, allowing hundreds of profile evaluations and variations to happen quickly.
下一节,我们将探讨如何配置提示流,以便让不同的提示/配置文件相互对比。在评估配置文件的性能之前,我们需要了解这一点。
In the next section, we look at how prompt flow can be configured to run prompt/ profile variations against each other. We’ll need to understand this before evaluating profiles’ performance.
提示流提供了一种机制,允许在LLM提示/配置文件中存在多种变体。该工具非常适合比较配置文件变体之间的细微或显著差异。在执行批量评估时,它对于快速评估配置文件的性能至关重要。
Prompt flow provides a mechanism to allow for multiple variations within an LLM prompt/profile. This tool is excellent for comparing subtle or significant differences between profile variations. When used in performing bulk evaluations, it can be invaluable for quickly assessing the performance of a profile.
recommender_with_variations/flow.dag.yaml如图 9.15 所示,在 VS Code 和流程可视化编辑器中打开文件。这次,我们将配置文件做得更加通用,并允许在输入级别进行自定义。这样,我们的推荐内容就可以扩展到任何类型,而不仅仅是穿越时空的电影。
Open the recommender_with_variations/flow.dag.yaml file in VS Code and the flow visual editor, as shown in figure 9.15. This time, we’re making the profile more generalized and allowing for customization at the input level. This allows us to expand our recommendations to anything and not just time travel movies.
新增的“主题”、“类型”、“格式”和“自定义”输入项使我们能够定义一个可以轻松调整以适应任何推荐的配置文件。这也意味着我们必须根据推荐用例对这些输入项进行预处理。预处理这些输入项的方法有很多种;图 9.16 展示了两种预处理输入项的示例。图中展示了两种预处理输入项的选项,选项 A 和选项 B。选项 A 代表经典的用户界面;例如,用户可以选择主题或类型等。选项 B 则使用代理/聊天助手与用户进行交互,以便更好地了解用户所需的主题、类型等信息。
The new inputs Subject, Genre, Format, and Custom allow us to define a profile that can easily be adjusted to any recommendation. This also means that we must prime the inputs based on the recommendation use case. There are multiple ways to prime these inputs; two examples of priming inputs are shown in figure 9.16. The figure shows two options, options A and B, for priming inputs. Option A represents the classic UI; perhaps there are objects for the user to select the subject or genre, for example. Option B places a proxy/chat agent to interact with the user better to understand the desired subject, genre, and so on.
即使考虑到 LLM 的强大功能,您可能仍然需要或想要使用选项 A。选项 A 的优点在于,您可以像使用任何现代用户界面一样,对输入进行约束和验证。然而,选项 A 的缺点是,这种约束行为可能会限制未来的使用场景。
Even considering the power of LLMs, you may still want or need to use option A. The benefit of option A is that you can constrain and validate the inputs much like you do with any modern UI. Alternatively, the downside of option A is that the constrained behavior may limit and restrict future use cases.
方案 B 提供了一种更流畅、更自然的方式,无需传统的用户界面。它比方案 A 功能更强大、可扩展性更强,但也引入了更多评估上的未知因素。然而,如果方案 B 使用的代理程序编写得当,就能极大地帮助收集用户更优质的信息。
Option B represents a more fluid and natural way without a traditional UI. It’s far more powerful and extensible than option A but also introduces more unknowns for evaluation. However, if the proxy agent that option B uses is written well, it can assist a lot in gathering better information from the user.
您选择的选项将决定您如何评估个人资料。如果您可以接受功能受限的用户界面,那么输入内容也可能仅限于一组离散值。目前,我们假设输入引导选择选项 B,这意味着输入值将由其名称定义。
The option you choose will dictate how you need to evaluate your profiles. If you’re okay with a constrained UI, then it’s likely that the inputs will also be constrained to a set of discrete values. For now, we’ll assume option B for input priming, meaning the input values will be defined by their name.
要返回 VS Code 并查看带有变体流程的推荐器可视化视图,请单击图 9.15 中所示的图标以打开变体并允许编辑。然后,单击recommend.jinja2和recommender_variant_1.jinja2链接以并排打开文件,如图 9.17 所示。
To get back to VS Code and the visual view of the recommender with variants flow, click the icon shown earlier in figure 9.15 to open the variants and allow editing. Then, click the recommend.jinja2 and recommender_variant_1.jinja2 links to open the files side by side, as shown in figure 9.17.
图 9.17 展示了不同变体配置文件之间的差异。一种配置文件将输入注入用户提示符,另一种则将输入注入系统提示符。然而,必须理解的是,变体不仅仅体现在配置文件设计上,如表 9.1 所示。
Figure 9.17 demonstrates the difference between the variant profiles. One profile injects the inputs into the user prompt, and the other injects them into the system prompt. However, it’s essential to understand that variations can encompass more than profile design, as identified in table 9.1.
|
选项
|
评估选项示例
|
笔记
|
|---|---|---|
| Jinja2 提示模板 |
比较系统提示音变体、用户提示音变体或混合提示音变体。 |
这里可以应用无数种组合和技巧。响应式工程技术也在不断发展。 |
| 法学硕士 |
比较 GPT-9.5 和 GPT-4。 比较 GPT-4 和 GPT-4 Turbo。 比较开源模型和商业模型。 |
这是评估模型性能并使其与提示相匹配的有效方法。它还可以帮助您调整配置文件,使其能够与开源模型和/或更经济的模型配合使用。 |
| 温度 |
将温度 0(无随机性)与温度 1(最大随机性)进行比较。 |
温度的变化会显著改变某些提示的响应,这可能会提高或降低性能。 |
| 最大代币数 |
比较小额代币和大额代币。 |
这可以帮助您减少和最大化代币使用量。 |
| 高级参数 |
与诸如top_p、、、和等presence_penalty选项进行比较。frequency_penaltylogit_bias |
我们将在后面的章节中介绍这些高级参数的使用方法。 |
| 函数调用 |
比较不同的函数调用方式。 |
函数调用将在本章后面部分进行讲解。 |
在这个简单的例子中,我们将通过改变输入内容来模拟系统提示或用户提示,从而实现提示的变体。请参考图 9.17 了解其具体效果。然后,我们可以点击顶部的“播放(全部运行)”按钮并选择“两者”,快速运行这两个变体,如图 9.18 所示。
For this simple example, we’re just going to use prompt variations by varying the input to reflect in either the system or user prompt. Refer to figure 9.17 for what this looks like. We can then quickly run both variations by clicking the Play (Run All) button at the top and choosing both, as shown in figure 9.18.
在终端窗口中,您将看到两次运行的结果。结果可能看起来相似,因此我们现在必须进入下一节,讨论如何评估不同版本之间的差异。
In the terminal window, you’ll see the results of both runs. The results will likely look similar, so now we must move on to how we evaluate the difference between variations in the next section.
评估提示/配置文件的性能通常不能简单地用准确率或正确率来衡量。衡量配置文件的性能取决于具体用例和预期结果。如果评估能简单地判断回答是否正确,那就再好不过了。然而,在大多数情况下,评估并非如此简单。
Evaluation of prompt/profile performance isn’t something we can typically do using a measure of accuracy or correct percentage. Measuring the performance of a profile depends on the use case and desired outcome. If that is as simple as determining if the response was right or wrong, all the better. However, in most cases, evaluation won’t be that simple.
在教育领域,评分标准(rubric)的概念定义了一套结构化的标准和准则,学生必须达到这些标准才能获得特定成绩。评分标准也可用于指导学生完成某个项目或任务。我们可以按照以下步骤定义评分标准,用于评估学生完成某个项目或任务的表现:
In education, the rubric concept defines a structured set of criteria and standards a student must establish to receive a particular grade. A rubric can also be used to define a guide for the performance of a profile or prompt. We can follow these steps to define a rubric we can use to evaluate the performance of a profile or prompt:
“基准性”是一个可以应用于个人资料和题目评估的概念——它定义了回答与特定评分标准和准则的契合程度。你也可以将“基准性”理解为题目或个人资料输出的基准预期。
Grounding is a concept that can be applied to profile and prompt evaluation—it defines how well a response is aligned with a given rubric’s specific criteria and standards. You can also think of grounding as the baseline expectation of a prompt or profile output.
以下列表总结了在使用接地进行性能评估时的一些其他重要注意事项:
This list summarizes some other important considerations when using grounding with profile evaluation:
一份条理清晰的回答应符合所有评分标准,并涵盖给定的背景和目标。而条理不清的回答则会不及格,或者完全不符合任何标准、背景和目标。
A well-grounded response aligns with all the rubric criteria within the given context and objectives. Poorly grounded responses will fail or miss the entire criteria, context, and objectives.
由于评价标准和基础性概念可能仍然比较抽象,让我们来看看如何将它们应用到我们当前的推荐系统中。以下列表展示了如何定义评价标准并将其应用于我们的推荐系统示例:
As the concepts of rubrics and grounding may still be abstract, let’s look at applying them to our current recommender example. Following is a list that follows the process for defining a rubric as applied to our recommender example:
|
等级
|
描述
|
|---|---|
| 1 |
对齐效果差:这与根据标准所预期的结果相反。 |
| 2 |
匹配度差:这不符合给定的标准。 |
| 3 |
平庸的对齐方式:它可能符合也可能不符合既定标准。 |
| 4 |
契合度高:可能并非 100% 符合标准,但其他方面契合度很高。 |
| 5 |
对齐效果极佳:根据给定的标准,这是一个很好的推荐。 |
现在,我们可以运用这个基本评分标准来评估我们个人资料中的回复。您可以手动评估,或者像下一节中将要介绍的那样,使用第二个法学硕士(LLM)个人资料进行评估。
This basic rubric can now be applied to evaluate the responses for our profile. You can do this manually, or as you’ll see in the next section, using a second LLM profile.
本部分将采用另一个LLM提示/模板进行评估和依据说明。第二个LLM提示将在生成建议后添加一个模块。它将处理生成的建议,并根据之前的评分标准对每条建议进行评估。
This section will employ another LLM prompt/profile for evaluation and grounding. This second LLM prompt will add another block after the recommendations are generated. It will process the generated recommendations and evaluate each one, given the previous rubric.
在 GPT-4 和其他先进的语言学习模型出现之前,我们从未考虑过使用其他语言学习模型提示来评估或构建一个模型。概况分析。在使用LLM模型构建概况时,通常需要使用不同的模型。但是,如果要比较不同的概况,则使用相同的LLM模型进行评估和概况构建是合适的。
Before GPT-4 and other sophisticated LLMs came along, we would have never considered using another LLM prompt to evaluate or ground a profile. You often want to use a different model when using LLMs to ground a profile. However, if you’re comparing profiles against each other, using the same LLM for evaluation and grounding is appropriate.
recommender_with_LLM_evaluation\flow.dag.yaml在提示流程可视化编辑器中打开文件,向下滚动到该evaluate_recommendation模块,然后单击evaluate_recommendation.jinja2链接打开文件,如图 9.19 所示。图中标识了评分标准的每个部分。
Open the recommender_with_LLM_evaluation\flow.dag.yaml file in the prompt flow visual editor, scroll down to the evaluate_recommendation block, and click the evaluate_recommendation.jinja2 link to open the file, as shown in figure 9.19. Each section of the rubric is identified in the figure.
我们制定了一套评价标准,它不仅定义明确,而且以提示的形式呈现,可用于评估推荐意见。这使我们能够自动评估针对特定用户画像的推荐意见的有效性。当然,您也可以使用该评价标准手动评分和评估推荐意见,以获得更准确的基准。
We have a rubric that is not only well defined but also in the form of a prompt that can be used to evaluate recommendations. This allows us to evaluate the effectiveness of recommendations for a given profile—automatically. Of course, you can also use the rubric to score and evaluate the recommendations manually for a better baseline.
注意: 使用逻辑学习模型(LLM)评估提示和配置文件,可以为比较不同配置文件的性能提供可靠的基准。它还能以可控且可重复的方式,避免人为偏见的影响。这为任何配置文件或提示建立基准提供了一种极佳的机制。
Note Using LLMs to evaluate prompts and profiles provides a strong baseline for comparing the performance of a profile. It can also do this without human bias in a controlled and repeatable manner. This provides an excellent mechanism to establish baseline groundings for any profile or prompt.
回到recommender_with_LLM_evaluation流程可视化编辑器,我们可以点击播放按钮来运行流程并观察输出结果。您可以运行单个推荐方案,也可以在提示时运行两种方案。使用默认输入进行单次评估的输出结果如下所示。
Returning to the recommender_with_LLM_evaluation flow visual editor, we can run the flow by clicking the Play button and observing the output. You can run a single recommendation or run both variations when prompted. The output of a single evaluation using the default inputs is shown in the following listing.
{
推荐书目:《蝴蝶效应》
主题:5
格式:5
类型:4
标题:入门指南
主题:5
格式:5
类型:4
标题:时光大盗
主题:5
格式:5
类型:5英寸
}{
"recommendations": "Title: The Butterfly Effect
Subject: 5
Format: 5
Genre: 4
Title: Primer
Subject: 5
Format: 5
Genre: 4
Title: Time Bandits
Subject: 5
Format: 5
Genre: 5"
}
我们现在有了一套用于评估推荐系统的准则,评估过程会使用第二个LLM提示自动运行。下一节,我们将探讨如何同时执行多个评估,并计算所有评估的总分。
We now have a rubric for grounding our recommender, and the evaluation is run automatically using a second LLM prompt. In the next section, we look at how to perform multiple evaluations simultaneously and then at a total score for everything.
有了对评分标准和基础知识的理解,我们现在可以着手评估和迭代完善理想的个人资料了。不过在此之前,我们需要清理LLM评估模块的输出结果。这需要我们将推荐结果解析成更符合Python风格的代码,我们将在下一节中讨论这个问题。
With our understanding of rubrics and grounding, we can now move on to evaluating and iterating the perfect profile. Before we do that, though, we need to clean up the output from the LLM evaluation block. This will require us to parse the recommendations into something more Pythonic, which we’ll tackle in the next section.
由于评估模块的原始输出是文本,我们现在需要将其解析成更易于使用的格式。当然,编写解析函数很简单,但还有更好的方法可以自动转换响应。我们在第 5 章“代理动作”中介绍了返回响应的更佳方法。
As the raw output from the evaluation block is text, we now want to parse that into something more usable. Of course, writing parsing functions is simple, but there are better ways to cast responses automagically. We covered better methods for returning responses in chapter 5, on agent actions.
在 VS Code 中打开chapter_09\prompt_flow\recommender_with_parsing\flow.dag.yaml,并在可视化编辑器中查看流程。找到该parsing_results代码块,然后单击链接在编辑器中打开 Python 文件,如图 9.20 所示。
Open chapter_09\prompt_flow\recommender_with_parsing\flow.dag.yaml in VS Code, and look at the flow in the visual editor. Locate the parsing_results block, and click the link to open the Python file in the editor, as shown in figure 9.20.
parsing_results.py在 VS Code 中打开文件文件的代码parsing_results.py如清单 9.2 所示。
The code for the parsing_results.py file is shown in listing 9.2.
parsing_results.py 来自 promptflow 导入工具
@工具 #1
def parse(input: str) -> str:
将推荐影片拆分成单独的电影单元
rblocks = input.strip().split("\n\n") #2
# 将单个推荐块解析为字典的函数
def parse_block(block):
lines = block.split('\n')
rdict = {}
对于 lines 中的每行:
kvs = line.split(':')
key, value = kvs[0], kvs[1]
rdict[key.lower()] = value #3
返回判决
parsed = [parse_block(block) for block in rblocks] #4
返回已解析结果from promptflow import tool
@tool #1
def parse(input: str) -> str:
# Splitting the recommendations into individual movie blocks
rblocks = input.strip().split("\n\n") #2
# Function to parse individual recommendation block into dictionary
def parse_block(block):
lines = block.split('\n')
rdict = {}
for line in lines:
kvs = line.split(': ')
key, value = kvs[0], kvs[1]
rdict[key.lower()] = value #3
return rdict
parsed = [parse_block(block) for block in rblocks] #4
return parsed
我们将清单 9.1 中的推荐输出(一个字符串)转换为字典。因此,以下代码会将该字符串转换为如下所示的 JSON 块:
We’re converting the recommendations output from listing 9.1, which is just a string, into a dictionary. So this code will convert this string into the JSON block shown next:
解析前:
Before parsing:
标题:蝴蝶效应 主题:5 格式:5 类型:4 标题:入门指南 主题:5 格式:5 类型:4 标题:时光大盗 主题:5 格式:5 类型:5英寸
"Title: The Butterfly Effect Subject: 5 Format: 5 Genre: 4 Title: Primer Subject: 5 Format: 5 Genre: 4 Title: Time Bandits Subject: 5 Format: 5 Genre: 5"
解析后:
After parsing:
{
标题:蝴蝶效应
主题:5,
"格式": "5,
"gen": "4"
},
{
标题: 入门读物,
主题:5,
"格式": "5,
"gen": "4"
},
{
标题:《时光大盗》
主题:5,
"格式": "5,
"genre": "5"
} {
"title": " The Butterfly Effect
"subject": "5",
"format": "5",
"genre": "4"
},
{
"title": " Primer",
"subject": "5",
"format": "5",
"genre": "4"
},
{
"title": " Time Bandits",
"subject": "5",
"format": "5",
"genre": "5"
}
该模块的输出parsing_results现在会传递到输出端,并被封装在一个推荐列表中。我们可以通过运行流程来查看这一切是如何实现的。
The output of this parsing_results block now gets passed to the output and is wrapped in a list of recommendations. We can see what all this looks like by running the flow.
在可视化编辑器中打开flow.dag.yaml流程图,然后点击“运行(全部运行)”按钮。务必选择使用两种推荐器变体。您将看到两种变体都运行并输出到终端。
Open flow.dag.yaml for the flow in the visual editor, and click the Play (Run All) button. Be sure to select to use both recommender variants. You’ll see both variations run and output to the terminal.
至此,我们已经拥有一个完整的推荐和LLM评估流程,该流程会针对每个输出中的每个标准输出一个分数。然而,为了对特定用户画像进行全面评估,我们需要根据不同的标准生成多个推荐。下一节我们将探讨如何对流程进行批量处理。
At this point, we have a full working recommendation and LLM evaluation flow that outputs a score for each criterion on each output. However, to do comprehensive evaluations of a particular profile, we want to generate multiple recommendations with various criteria. We’ll see how to do batch processing of flows in the next section.
在我们的通用推荐模型中,我们希望评估各种输入条件如何影响生成的推荐结果。幸运的是,提示流可以批量处理我们想要测试的任何变体。唯一的限制就是我们愿意投入的时间和金钱。
In our generic recommendation profile, we want to evaluate how various input criteria can affect the generated recommendations. Fortunately, prompt flow can batch-process any variations we want to test. The limit is only the time and money we want to spend.
要执行批量处理,我们首先必须创建一个包含输入条件的 JSON 行 (JSONL) 或 JSON 列表文档。如果您还记得,我们的输入条件在 JSON 格式中如下所示:
To perform batch processing, we must first create a JSON Lines (JSONL) or JSON list document of our input criteria. If you recall, our input criteria looked like the following in JSON format:
{
主题: “时间旅行”
"格式": "书籍",
"类型": "奇幻",
“自定义”:“不包含任何R级内容”
}{
"subject": "time travel",
"format": "books",
"genre": "fantasy",
"custom": "don't include any R rated content"
}
我们希望创建一个类似刚才所示的 JSON 对象列表,最好是随机排列的。当然,最简单的方法是使用以下提示符让 ChatGPT 创建一个 JSONL 文档:
We want to create a list of JSON objects like that just shown, preferably in a random manner. Of course, the simple way to do this is to prompt ChatGPT to create a JSONL document using the following prompt:
我正在开发一个推荐系统。该系统将根据以下条件推荐任何内容:
I am developing a recommendation agent. The agent will recommend anything given the following criteria:
1. 主题 - 例如:时间旅行、烹饪、假期
1. subject - examples: time travel, cooking, vacation
2. 格式 - 例如:书籍、电影、游戏
2. format - examples: books, movies, games
3. 类型:纪录片、动作片、爱情片
3. genre: documentary, action, romance
4. 自定义:请勿包含任何限制级内容
4. custom: don't include any R rated content
请根据这些条件生成一个随机列表,并以 JSONL 格式输出。列表中请包含 10 个项目。
Can you please generate a random list of these criteria and output it in the format of a JSON Lines file, JSONL. Please include 10 items in the list.
请前往 ChatGPT 并输入上述提示进行尝试。您可以在流程文件夹中找到之前生成的名为 `<flow_name>` 的文件\bulk_recommend.jsonl。该文件的内容如下所示,供您参考:
Try this out by going to ChatGPT and entering the preceding prompt. A previously generated file can be found in the flow folder, called \bulk_recommend.jsonl. The contents of this file have been shown here for reference:
{
“主题”:“时间旅行”,
"格式": "书籍",
"genre": "奇幻",
“自定义”: “不包含任何R级内容”
}
{
主题:太空探索,
"格式": "播客",
"genre": "科幻",
“自定义”: “仅包含适合家庭观看的内容”
}
{
“主题”:“神秘”,
"格式": "播客",
"genre": "奇幻",
“自定义”: “不包含任何R级内容”
}
{
主题:太空探索,
"格式": "播客",
"类型": "动作",
“自定义”: “仅包含适合家庭观看的内容”
}
{
主题:假期,
"格式": "书籍",
"genre": "惊悚片",
“自定义”: “不包含任何R级内容”
}
{
“主题”:“神秘”,
"格式": "书籍",
"genre": "科幻",
“自定义”: “不包含任何R级内容”
}
{
“主题”:“神秘”,
"格式": "书籍",
"genre": "romance",
“自定义”: “不包含任何R级内容”
}
{
主题:假期,
"格式": "电影",
"genre": "奇幻",
“自定义”: “不包含任何R级内容”
}
{
“主题”:“烹饪”,
"格式": "电视节目",
"genre": "惊悚片",
“自定义”: “仅包含适合家庭观看的内容”
}
{
“主题”:“神秘”,
"格式": "电影",
"genre": "romance",
“自定义”: “仅包含适合家庭观看的内容”
}{
"subject": "time travel",
"format": "books",
"genre": "fantasy",
"custom": "don't include any R rated content"
}
{
"subject": "space exploration",
"format": "podcasts",
"genre": "sci-fi",
"custom": "include family-friendly content only"
}
{
"subject": "mystery",
"format": "podcasts",
"genre": "fantasy",
"custom": "don't include any R rated content"
}
{
"subject": "space exploration",
"format": "podcasts",
"genre": "action",
"custom": "include family-friendly content only"
}
{
"subject": "vacation",
"format": "books",
"genre": "thriller",
"custom": "don't include any R rated content"
}
{
"subject": "mystery",
"format": "books",
"genre": "sci-fi",
"custom": "don't include any R rated content"
}
{
"subject": "mystery",
"format": "books",
"genre": "romance",
"custom": "don't include any R rated content"
}
{
"subject": "vacation",
"format": "movies",
"genre": "fantasy",
"custom": "don't include any R rated content"
}
{
"subject": "cooking",
"format": "TV shows",
"genre": "thriller",
"custom": "include family-friendly content only"
}
{
"subject": "mystery",
"format": "movies",
"genre": "romance",
"custom": "include family-friendly content only"
}
有了这个批量文件,我们可以使用批量 JSONL 文件中的各种输入条件运行两种变体。flow.dag.yaml在可视化编辑器中打开该文件,单击“批处理”(烧杯图标)启动批量数据加载过程,然后选择文件,如图 9.21 所示。在某些操作系统中,这可能显示为Local Data File.
With this bulk file, we can run both variants using the various input criteria in the bulk JSONL file. Open the flow.dag.yaml file in the visual editor, click Batch (the beaker icon) to start the bulk-data loading process, and select the file as shown in figure 9.21. For some operating systems, this may appear as Local Data File.
选择批量文件后,将打开一个新的 YAML 文档,并在文件底部添加一个“运行”链接,如图 9.22 所示。单击该链接即可批量运行输入。
After the bulk file is selected, a new YAML document will open with a Run link added at the bottom of the file, as shown in figure 9.22. Click the link to do the batch run of inputs.
此时,会发生几件事。流程可视化编辑器将会出现,旁边还会打开一个日志文件,显示运行进度。在终端窗口中,您将看到各个工作进程正在生成和运行。
At this point, a few things will happen. The flow visual editor will appear, and beside that a log file will open, showing the progress of the run. In the terminal window, you’ll see the various worker processes spawning and running.
请耐心等待。即使只处理 10 个项目,批量运行也可能需要几分钟或几秒钟,具体取决于硬件、之前的调用等各种因素。等待运行完成,您将在终端中看到结果摘要。
Be patient. The batch run, even for 10 items, may take a few minutes or seconds, depending on various factors such as hardware, previous calls, and so on. Wait for the run to complete, and you’ll see a summary of results in the terminal.
您还可以通过打开提示流扩展并选择最后一次运行来查看运行结果,如图 9.23 所示。然后,您可以单击表格单元格来深入了解每次运行。此对话框中会显示大量信息,可帮助您排查流程和配置文件的问题。
You can also view the run results by opening the prompt flow extension and selecting the last run, as shown in figure 9.23. Then, you dig into each run by clicking the table cells. A lot of information is exposed in this dialog, which can help you troubleshoot flows and profiles.
批处理运行期间会捕获大量信息,您可以通过可视化工具查看其中的大部分信息。点击终端窗口中的输出文件夹链接,即可查看更多信息。这将打开另一个 VS Code 会话,其中包含输出文件夹,方便您查看运行日志和其他详细信息。
A lot of information is captured during a batch run, and you can explore much of it through the visualizer. More information can be found by clicking the output folder link from the terminal window. This will open another session of VS Code with the output folder allowing you to review the run logs and other details.
现在我们已经完成了每个变体的批量运行,可以应用基准测试并评估两种提示的结果。下一节将使用新的流程来执行配置文件/提示评估。
Now that we’ve completed the batch run for each variant, we can apply grounding and evaluate the results of both prompts. The next section will use a new flow to perform the profile/prompt evaluation.
如图 9.24 所示,在可视化编辑器中打开chapter_3\prompt_flow\evaluate_groundings\flow.dag.yaml。评估流程中没有 LLM 模块,只有 Python 代码块,用于运行评分并汇总分数。
Open chapter_3\prompt_flow\evaluate_groundings\flow.dag.yaml in the visual editor, as shown in figure 9.24. There are no LLM blocks in the evaluation flow—just Python code blocks that will run the scoring and then aggregate the scores.
evaluate_groundings用于地面推荐运行的流程scoring现在我们可以查看`and`模块的代码aggregate,首先是清单 9.3 中的评分代码。该评分代码将每个标准的得分取平均值,得到一个平均分。该函数的输出是一个已处理推荐列表。
We can now look at the code for the scoring and aggregate blocks, starting with the scoring code in listing 9.3. This scoring code averages the score for each criterion into an average score. The output of the function is a list of processed recommendations.
line_process.py @工具
def line_process(recommendations: str): #1
输入 = 建议
输出 = []
对于输入中的 data_dict: #2
总分 = 0
得分计数 = 0
for key, value in data_dict.items(): #2
如果 key != "title": #3
尝试:
total_score += float(value)
得分 += 1
data_dict[key] = float(value) #4
除了:
经过
平均分 = 总分 / 得分次数(如果得分次数 > 0,否则为 0)
data_dict["avg_score"] = round(avg_score, 2) #5
output.append(data_dict)
返回输出@tool
def line_process(recommendations: str): #1
inputs = recommendations
output = []
for data_dict in inputs: #2
total_score = 0
score_count = 0
for key, value in data_dict.items(): #2
if key != "title": #3
try:
total_score += float(value)
score_count += 1
data_dict[key] = float(value) #4
except:
pass
avg_score = total_score / score_count if score_count > 0 else 0
data_dict["avg_score"] = round(avg_score, 2) #5
output.append(data_dict)
return output
根据这些有理有据的建议,我们可以继续使用代码块汇总分数aggregate——代码块的代码aggregate如下表所示。
From the grounded recommendations, we can move on to aggregating the scores with the aggregate block—the code for the aggregate block is shown in the following listing.
aggregate.py @工具
def aggregate(processed_results: List[str]):
items = [processed_results 中子列表的项]
↪ for item in sublist] #1
聚合 = {}
对于 items 中的每个 item:
对于 item.items() 中的每个键和值:
如果 key == 'title':
继续
如果 isinstance(value, (float, int)): #2
如果键在聚合中:
aggregated[key] += value
别的:
aggregated[key] = 值
对于 aggregated.items() 中的每个键和值: #3
值 = 值 / len(items)
log_metric(key=key, value=value) #4
aggregated[key] = 值
返回汇总值@tool
def aggregate(processed_results: List[str]):
items = [item for sublist in processed_results
↪ for item in sublist] #1
aggregated = {}
for item in items:
for key, value in item.items():
if key == 'title':
continue
if isinstance(value, (float, int)): #2
if key in aggregated:
aggregated[key] += value
else:
aggregated[key] = value
for key, value in aggregated.items(): #3
value = value / len(items)
log_metric(key=key, value=value) #4
aggregated[key] = value
return aggregated
聚合结果将包括每个标准的汇总得分和平均得分。由于评估/基准测试流程是独立的,因此可以应用于我们执行的任何推荐运行。这将使我们能够使用批量运行结果来比较任何变体的结果。
The result of the aggregations will be a summary score for each criterion and the average score. Since the evaluation/grounding flow is separate, it can be run over any recommendation run we perform. This will allow us to use the batch run results for any variation to compare results.
我们可以通过在可视化编辑器中打开并点击“批处理”(烧杯图标)来运行接地流程flow.dag.yaml。然后,根据提示,选择一个现有运行,再选择要评估的运行,如图 9.25 所示。这将打开一个 YAML 文件,底部有“运行”链接,就像我们之前看到的那样。点击“运行”链接即可运行评估。
We can run the grounding flow by opening flow.dag.yaml in the visual editor and clicking Batch (beaker icon). Then, when prompted, we select an existing run and then select the run we want to evaluate, as shown in figure 9.25. This will open a YAML file with the Run link at the bottom, as we’ve seen before. Click the Run link to run the evaluation.
运行完成后,终端窗口会显示结果摘要。您可以点击输出链接在 VS Code 中打开文件夹并分析结果,但还有更好的方法来比较它们。
After the run is completed, you’ll see a summary of the results in the terminal window. You can click the output link to open the folder in VS Code and analyze the results, but there is a better way to compare them.
打开提示流扩展,将焦点移至“批处理运行历史记录”窗口,向下滚动到“运行对比”部分,如图 9.26 所示。选择要比较的运行(通常是位于顶部附近的运行),直到出现勾选标记。然后,右键单击该运行,并选择“可视化运行”选项。“批处理运行可视化”窗口随即打开,您将在顶部看到每个运行的指标。
Open the prompt flow extension, focus on the Batch Run History window, and scroll down to the Run against Run section, as shown in figure 9.26. Select the runs you want to compare—likely the ones near the top—so that the checkmark appears. Then, right-click the run, and select the Visualize Runs option. The Batch Run Visualization window opens, and you’ll see the metrics for each of the runs at the top.
现在我们可以看到用户提示(版本 0)和系统提示(版本 1)之间存在显著差异。如果您需要回顾提示/配置文件的外观,请参阅图 9.15。至此,应该很明显,将输入参数注入系统提示可以提供更好的推荐。
We can now see a significant difference between profile/prompt variation 0, the user prompt, and variation 1, the system prompt. Refer to figure 9.15 if you need a refresher on what the prompts/profiles look like. At this point, it should be evident that injecting the input parameters into the system prompt provides better recommendations.
现在您可以返回并尝试其他配置文件或其他选项,看看这对您的推荐有何影响。可能性几乎是无限的,但希望您能从中看出,对于构建代理配置文件和提示信息而言,这是一个多么优秀的工具提示流程。
You can now go back and try other profiles or other variant options to see what effect this has on your recommendations. The possibilities are virtually endless, but hopefully you can see what an excellent tool prompt flow will be for building agent profiles and prompts.
通过以下练习来加深你对相关内容的理解:
Use the following exercises to improve your knowledge of the material:
目标 ——通过在提示流程中创建和测试新的提示变体来改进推荐结果。
Objective —Improve the recommendation results by creating and testing a new prompt variant in prompt flow.
任务:
Tasks:
目标 ——通过在评分标准中加入自定义字段并更新评估流程来增强评估标准。
Objective —Enhance the evaluation criteria by incorporating a custom field into the rubric and updating the evaluation flow.
任务:
Tasks:
目标 ——通过开发新的用例和创建评估标准来扩展快速工程的应用。
Objective —Expand the application of prompt engineering by developing a new use case and creating an evaluation rubric.
任务:
Tasks:
目标 ——通过在本地服务器上运行 LM Studio 来评估不同开源 LLM 的性能。
Objective —Assess the performance of different open source LLMs by hosting a local server with LM Studio.
任务:
Tasks:
目标 ——应用提示工程策略,利用提示流程构建和评估新的提示或配置文件。
Objective —Apply prompt engineering strategies to build and evaluate new prompts or profiles using prompt flow.
任务:
Tasks:
现在我们已经考察了构成智能体语义记忆组件的记忆和检索模式,接下来我们可以看看智能体中最后一个也是最重要的组件:规划。规划涵盖诸多方面,包括推理、理解、评估和反馈。
Now that we’ve examined the patterns of memory and retrieval that define the semantic memory component in agents, we can take a look at the last and most instrumental component in agents: planning. Planning encompasses many facets, from reasoning, understanding, and evaluation to feedback.
为了探究如何引导逻辑逻辑模型(LLM)进行推理、理解和规划,我们将演示如何通过提示工程来激发推理能力,并将其扩展到规划领域。语义内核(SK)提供的规划解决方案涵盖多种规划形式。本章最后,我们将把自适应反馈集成到一个新的规划器中。
To explore how LLMs can be prompted to reason, understand, and plan, we’ll demonstrate how to engage reasoning through prompt engineering and then expand that to planning. The planning solution provided by the Semantic Kernel (SK) encompasses multiple planning forms. We’ll finish the chapter by incorporating adaptive feedback into a new planner.
图 10.1 展示了本章将要介绍的高级提示工程策略,以及它们与我们将要介绍的各种技术之间的关系。图中所示的每一种方法都将在本章中进行探讨,从左上角所示的解决方案/直接提示的基础知识,到右下角所示的自洽性和思维树 (ToT) 提示。
Figure 10.1 demonstrates the high-level prompt engineering strategies we’ll cover in this chapter and how they relate to the various techniques we’ll cover. Each of the methods showcased in the figure will be explored in this chapter, from the basics of solution/direct prompting, shown in the top-left corner, to self-consistency and tree of thought (ToT) prompting, in the bottom right.
直接提示解决方案通常是用户在向学习型学习者 (LLM) 提问或解决特定问题时采用的第一种提示方式。对于任何 LLM 的使用,这些技巧可能看起来显而易见,但值得回顾,以便建立思考和规划的基础。在下一节中,我们将从头开始,提出问题并期待答案。
Direct solution prompting is generally the first form of prompt engineering that users employ when asking LLMs questions or solving a particular problem. Given any LLM use, these techniques may seem apparent, but they are worth reviewing to establish the foundation of thought and planning. In the next section, we’ll start from the beginning, asking questions and expecting answers.
在本章的练习中,我们将运用提示流程图来构建和评估各种技巧。(我们已在第9章详细介绍过这个工具,如果您需要复习,请参阅该章节。)提示流程图是理解这些技巧如何运作以及探索规划和推理过程的绝佳工具。
For the exercises in this chapter, we’ll employ prompt flow to build and evaluate the various techniques. (We already extensively covered this tool in chapter 9, so refer to that chapter if you need a review.) Prompt flow is an excellent tool for understanding how these techniques work and exploring the flow of the planning and reasoning process.
打开 Visual Studio Code (VS Code) 并切换到chapter 10源文件夹。为该文件夹创建一个新的虚拟环境,然后安装该requirements.txt文件。如果您需要帮助设置章节的 Python 环境,请参阅附录 B。
Open Visual Studio Code (VS Code) to the chapter 10 source folder. Create a new virtual environment for the folder, and install the requirements.txt file. If you need help setting up a chapter’s Python environment, refer to appendix B.
我们将查看prompt_flow/question-answering-prompting文件夹中的第一个流程。如图 10.2 所示,在可视化编辑器中打开该flow.dag.yaml文件。在右侧,您将看到组件流程。顶部是question_answerLLM 提示,后面是两个Embedding组件,最后是用于执行名为 的评估的 LLM 提示evaluate。
We’ll look at the first flow in the prompt_flow/question-answering-prompting folder. Open the flow.dag.yaml file in the visual editor, as shown in figure 10.2. On the right side, you’ll see the flow of components. At the top is the question_answer LLM prompt, followed by two Embedding components and a final LLM prompt to do the evaluation called evaluate.
flow.dag.yaml文件在可视化编辑器中打开,突出显示了流程的各个组成部分。清单 10.1 中的分解图使用类似 YAML 缩减伪代码的方式,更详细地展示了流程的结构和组成部分。您还可以看到各个组件的输入和输出,以及运行该流程的示例输出。
The breakdown in listing 10.1 shows the structure and components of the flow in more detail using a sort of YAML-shortened pseudocode. You can also see the input and outputs to the various components and a sample output from running the flow.
question-answer-prompting流程 输入:
背景:要问的问题的内容
问题:针对内容提出的问题
预期答案
法学硕士:问答(提问时使用的提示语)
输入:
背景和问题
输出:
对这个问题的预测/答案
嵌入:使用 LLM 嵌入模型创建嵌入。
文本的表示
Embedding_predicted:嵌入问答 LLM 的输出。
Embedding_expected:嵌入预期答案的输出
Python:评估(用于衡量嵌入相似度的 Python 代码)
输入:
嵌入预测输出
嵌入预期输出
输出:
预测值与预期值之间的相似度得分
输出:
上下文:-> 输入.上下文
问题:-> 输入问题
预期结果:-> input.expected
预测结果:-> output.question_answer
评估分数:输出.评估
### 示例输出
{
“背景”:“回到未来(1985)……”
"evaluation_score": 0.9567478002354606,
“预期”:“马蒂穿越回了30年前。”
“预测”:“马蒂从1985年穿越回30年前的1955年。”
在电影《回到未来》中……”
问题:“在电影中,马蒂究竟穿越回了多久以前?”
回到未来(1985)
} Inputs:
context : the content to ask the question about
question : question asked specific to the content
expected : the expected answer
LLM: Question-Answer (the prompt used to ask the question)
inputs:
context and question
outputs:
the prediction/answer to the question
Embeddings: uses an LLM embedding model to create the embedding
representation of the text
Embedding_predicted: embeds the output of the Question-Answer LLM
Embedding_expected: embeds the output of the expected answer
Python: Evaluation (Python code to measure embedding similarity)
Inputs:
Embedding_predicted output
Embedding_expected output
Outputs:
the similarity score between predicted and expected
Outputs:
context: -> input.context
question: -> input.question
expected: -> input.expected
predicted: -> output.question_answer
evaluation_score: output.evaluation
### Example Output
{
"context": "Back to the Future (1985)…",
"evaluation_score": 0.9567478002354606,
"expected": "Marty traveled back in time 30 years.",
"predicted": "Marty traveled back in time 30 years from 1985 to 1955
in the movie \"Back to the Future.\"",
"question": "How far did Marty travel back in time in the movie
Back to the Future (1985)"
}
运行此流程之前,请确保您的 LLM 模块已正确配置。这可能需要您建立与所选 LLM 的连接。如果您需要了解如何完成此操作,请参阅第 9 章。Embedding如果您未使用 OpenAI,则需要使用您的连接配置 LLM 和模块。
Before running this flow, make sure your LLM block is configured correctly. This may require you to set up a connection to your chosen LLM. Again, refer to chapter 9 if you need a review on how to complete this. You’ll need to configure the LLM and Embedding blocks with your connection if you’re not using OpenAI.
配置好 LLM 连接后,点击可视化编辑器中的“播放”按钮或使用 YAML 编辑器窗口中的“测试”(Shift-F5)链接来运行流程。如果所有连接和配置都正确,您应该会看到类似清单 10.1 中的输出。
After configuring your LLM connection, run the flow by clicking the Play button from the visual editor or using the Test (Shift-F5) link in the YAML editor window. If everything is connected and configured correctly, you should see output like that in listing 10.1.
在 VS Code 中打开question_answer.jinja2文件,如清单 10.2 所示。该清单展示了基本的问答式提示。在这种提示中,系统消息会描述基本规则并提供回答问题的上下文。在第 4 章中,我们探讨了检索增强生成 (RAG) 模式,而此提示遵循类似的模式。
Open the question_answer.jinja2 file in VS Code, as shown in listing 10.2. This listing shows the basic question-and-answer-style prompt. In this style of prompt, the system message describes the basic rules and provides the context to answer the question. In chapter 4, we explored the retrieval augmented generation (RAG) pattern, and this prompt follows a similar pattern.
question_answer.jinja2系统:
请根据以下上下文回答用户的问题。请保留答案。
简明扼要。如果不确定答案,请回复“不确定”。
回答。
上下文:{{context}} #1
用户:
问题:{{question}} #2system:
Answer the users question based on the context below. Keep the answer
short and concise. Respond "Unsure about answer" if not sure about the
answer.
Context: {{context}} #1
user:
Question: {{question}} #2
本练习展示了使用 LLM 向用户提问有关内容的简单方法。然后,使用相似度匹配分数评估问题的回答。从清单 10.1 的输出可以看出,LLM 在回答有关上下文的问题方面表现出色。下一节,我们将探讨一种使用直接提示的类似技术。
This exercise shows the simple method of using an LLM to ask questions about a piece of content. Then, the question response is evaluated using a similarity matching score. We can see from the output in listing 10.1 that the LLM does a good job of answering a question about the context. In the next section, we’ll explore a similar technique that uses direct prompting.
少样本提示类似于问答式提示,但提示内容更侧重于提供少量示例,而非事实或背景信息。这使得学习学习模型能够适应之前未曾见过的模式或内容。虽然这种方法听起来像是问答式提示,但其具体实现方式却截然不同,而且效果可能非常显著。
Few-shot prompting is like question-and-answer prompting, but the makeup of the prompt is more about providing a few examples than about facts or context. This allows the LLM to bend to patterns or content not previously seen. While this approach sounds like question and answer, the implementation is quite different, and the results can be powerful.
在 VS Code 和可视化编辑器中打开prompt_flow/few-shot-prompting/flow.dag.yaml。大部分流程与前面图 10.2 所示的流程类似,不同之处在清单 10.3 中突出显示,清单 10.3 显示了 YAML 伪代码表示。此流程与之前流程的主要区别在于输入和 LLM 提示。
Open prompt_flow/few-shot-prompting/flow.dag.yaml in VS Code and the visual editor. Most of the flow looks like the one pictured earlier in figure 10.2, and the differences are highlighted in listing 10.3, which shows a YAML pseudocode representation. The main differences between this and the previous flow are the inputs and LLM prompt.
few-shot-prompting流程 输入:
语句:介绍上下文,然后请求输出
预期答案:对该陈述的预期答案
LLM:few_shot(用于提问的提示)
输入:语句
输出:对该陈述的预测/答案
嵌入:使用 LLM 嵌入模型创建嵌入。
文本的表示
Embedding_predicted:嵌入 few_shot LLM 的输出。
Embedding_expected:嵌入预期答案的输出
Python:评估(用于衡量嵌入相似度的 Python 代码)
输入:
嵌入预测输出
嵌入预期输出
输出:预测值与预期值之间的相似度得分
输出:
语句:-> 输入语句
预期结果:-> input.expected
预测结果:-> output.few_shot
评估分数:输出.评估
### 示例输出
{
"evaluation_score": 0.906647282920417, #1
“预期”:“我们吃了日落饭,看了日落。”
“预测”:“长途跋涉之后,我们坐在湖边。”
随着天空变换,我们享受了一段宁静的日落。
绚丽的橙色和粉色。 #2
“声明”:“sunner 是我们在加拿大吃的一种食物。”
日落时分,请用这个词造句。 #3
} Inputs:
statement : introduces the context and then asks for output
expected : the expected answer to the statement
LLM: few_shot (the prompt used to ask the question)
inputs:statement
outputs: the prediction/answer to the statement
Embeddings: uses an LLM embedding model to create the embedding
representation of the text
Embedding_predicted: embeds the output of the few_shot LLM
Embedding_expected: embeds the output of the expected answer
Python: Evaluation (Python code to measure embedding similarity)
Inputs:
Embedding_predicted output
Embedding_expected output
Outputs: the similarity score between predicted and expected
Outputs:
statement: -> input.statement
expected: -> input.expected
predicted: -> output.few_shot
evaluation_score: output.evaluation
### Example Output
{
"evaluation_score": 0.906647282920417, #1
"expected": "We ate sunner and watched the setting sun.",
"predicted": "After a long hike, we sat by the lake
and enjoyed a peaceful sunner as the sky turned
brilliant shades of orange and pink.", #2
"statement": "A sunner is a meal we eat in Cananda
at sunset, please use the word in a sentence" #3
}
按 Shift-F5 或点击可视化编辑器中的“播放/测试”按钮运行流程。您应该看到类似清单 10.3 的输出,其中 LLM 在给定初始语句的情况下,正确地在句子中使用了sunner一词(一个虚构的词)。
Run the flow by pressing Shift-F5 or clicking the Play/Test button from the visual editor. You should see output like listing 10.3 where the LLM has used the word sunner (a made-up term) correctly in a sentence given the initial statement.
这项练习展示了如何利用提示改变语言学习模型(LLM)的行为,使其与已学知识相悖。我们正在改变LLM对“正确”概念的理解。此外,我们还会利用这种改变后的视角来引导LLM使用一个自造词。
This exercise demonstrates the ability to use a prompt to alter the behavior of the LLM to be contrary to what it has learned. We’re changing what the LLM understands to be accurate. Furthermore, we then use that modified perspective to elicit the use of a made-up word.
在 VS Code 中打开few_shot.jinja2示例 10.4 所示的提示符。该示例演示了如何设置一个简单的角色——一位古怪的词典编纂者,并提供他之前定义和使用过的单词示例。提示符的基础功能允许语言学习模型 (LLM) 扩展示例,并使用其他单词生成类似的结果。
Open the few_shot.jinja2 prompt in VS Code, shown in listing 10.4. This listing demonstrates setting up a simple persona, that of an eccentric dictionary maker, and then providing examples of words it has defined and used before. The base of the prompt allows for the LLM to extend the examples and produce similar results using other words.
few_shot.jinja2系统:
你是一位特立独行的词典编纂者。你将被要求……
用这个词造句。
以下示例演示了如何使用……来构建句子。
这个词。
“whatpu”是一种原产于坦桑尼亚的小型毛茸茸动物。
包含单词 whatpu 的句子示例是: #1
我们在非洲旅行时,看到了这些非常可爱的小家伙。
“farduddle”指的是快速地上下跳跃。例如……
使用了单词 farduddle 的句子是:
我兴奋得开始胡言乱语。 #2
请仅返回用户请求的句子。 #3
用户:
{{陈述}} #4system:
You are an eccentric word dictionary maker. You will be asked to
construct a sentence using the word.
The following are examples that demonstrate how to craft a sentence using
the word.
A "whatpu" is a small, furry animal native to Tanzania.
An example of a sentence that uses the word whatpu is: #1
We were traveling in Africa and we saw these very cute whatpus.
To do a "farduddle" means to jump up and down really fast. An example of a
sentence that uses the word farduddle is:
I was so excited that I started to farduddle. #2
Please only return the sentence requested by the user. #3
user:
{{statement}} #4
你可能会说我们这是在强迫低效模型产生幻觉,但这种技术是行为改变的基础。它允许我们构建提示,引导低效模型做出与其所学完全相反的行为。这种提示基础也为其他形式的行为改变奠定了基础。从改变低效模型的感知和背景,我们将在下一节展示一个直接解决方案的最终示例。
You may say we’re forcing the LLM to hallucinate here, but this technique is the basis for modifying behavior. It allows prompts to be constructed to guide an LLM to do everything contrary to what it learned. This foundation of prompting also establishes techniques for other forms of altered behavior. From the ability to alter the perception and background of an LLM, we’ll move on to demonstrate a final example of a direct solution in the next section.
零样本提示或学习是指以某种方式生成提示,使学习学习模型(LLM)能够进行泛化。这种泛化能力蕴含在学习学习模型内部,并通过零样本提示来展现。在零样本提示中,不提供任何示例,而是提供一套指导原则或规则来引导学习学习模型。
Zero-shot prompting or learning is the ability to generate a prompt in such a manner that allows the LLM to generalize. This generalization is embedded within the LLM and demonstrated through zero-shot prompting, where no examples are given, but instead a set of guidelines or rules are given to guide the LLM.
运用这种技术简单有效,能够引导语言学习模型(LLM)根据其内部知识生成回复,而无需其他上下文信息。这是一种巧妙而强大的技术,可以将LLM的知识应用到其他应用中。该技术与其他提示策略相结合,已被证明能够有效地替代其他语言分类模型——例如,识别文本中情感或情绪的模型。
Employing this technique is simple and works well to guide the LLM to generate replies given its internal knowledge and no other contexts. It’s a subtle yet powerful technique that applies the knowledge of the LLM to other applications. This technique, combined with other prompting strategies, is proving effective at replacing other language classification models—models that identify the emotion or sentiment in text, for example.
在 VS Code 的提示符流程可视化编辑器中打开prompt_flow/zero-shot-prompting/flow.dag.yaml。此流程与前面图 10.1 所示的流程几乎完全相同,但在实现上略有不同,如清单 10.5 所示。
Open prompt_flow/zero-shot-prompting/flow.dag.yaml in the VS Code prompt flow visual editor. This flow is again almost identical to that shown earlier in figure 10.1 but differs slightly in implementation, as shown in listing 10.5.
zero-shot-prompting流量 输入:
声明:待分类的声明
预期:该陈述的预期分类
LLM:zero_shot(用于分类的提示)
输入:语句
输出:根据语句预测的类别
嵌入:使用 LLM 嵌入模型创建嵌入。
文本的表示
Embedding_predicted:嵌入零样本 LLM 的输出。
Embedding_expected:嵌入预期答案的输出
Python:评估(用于衡量嵌入相似度的 Python 代码)
输入:
嵌入预测输出
嵌入预期输出
输出:预测值与预期值之间的相似度得分
输出:
语句:-> 输入语句
预期结果:-> input.expected
预测结果:-> output.few_shot
评估分数:输出.评估
### 示例输出
{
"evaluation_score": 1, #1
“预期”: “中性”,
“预测”:“中立”,
声明:“我认为这次假期还可以。” #2
} Inputs:
statement : the statement to be classified
expected : the expected classification of the statement
LLM: zero_shot (the prompt used to classify)
inputs: statement
outputs: the predicted class given the statement
Embeddings: uses an LLM embedding model to create the embedding
representation of the text
Embedding_predicted: embeds the output of the zero_shot LLM
Embedding_expected: embeds the output of the expected answer
Python: Evaluation (Python code to measure embedding similarity)
Inputs:
Embedding_predicted output
Embedding_expected output
Outputs: the similarity score between predicted and expected
Outputs:
statement: -> input.statement
expected: -> input.expected
predicted: -> output.few_shot
evaluation_score: output.evaluation
### Example Output
{
"evaluation_score": 1, #1
"expected": "neutral",
"predicted": "neutral",
"statement": "I think the vacation is okay. " #2
}
在 VS Code 的提示符流程可视化编辑器中按 Shift-F5 运行流程。您应该会看到类似于清单 10.5 中所示的输出。
Run the flow by pressing Shift-F5 within the VS Code prompt flow visual editor. You should see output similar to that shown in listing 10.5.
现在打开zero_shot.jinja2如清单 10.6 所示的提示。该提示很简单,没有使用任何示例来提取文本的情感倾向。尤其值得注意的是,提示甚至没有提及“情感”这个词组,但语言学习模型似乎理解了其意图。
Now open the zero_shot.jinja2 prompt as shown in listing 10.6. The prompt is simple and uses no examples to extract the sentiment from the text. What is especially interesting to note is that the prompt doesn’t even mention the phrase sentiment, and the LLM seems to understand the intent.
zero_shot.jinja2系统:
将文本分类为中性、负面或正面。
只返回结果,不做其他任何事。 #1
用户:
{{陈述}} #2system:
Classify the text into neutral, negative or positive.
Return on the result and nothing else. #1
user:
{{statement}} #2
零样本提示工程是指利用学习型模型(LLM)基于其训练材料进行广泛泛化的能力。本练习展示了如何将LLM中的知识应用于其他任务。LLM的自我情境化和知识应用能力可以超越其训练范围。下一节,我们将进一步探讨LLM的推理能力,从而拓展这一概念。
Zero-shot prompt engineering is about using the ability of the LLM to generalize broadly based on its training material. This exercise demonstrates how knowledge within the LLM can be put to work for other tasks. The LLM’s ability to self-contextualize and apply knowledge can extend beyond its training. In the next section, we extend this concept further by looking at how LLMs can reason.
像 ChatGPT 这样的语言学习模型(LLM)被开发出来是为了作为聊天自动完成模型,用户将文本内容输入模型,模型会根据输入内容做出相应的响应。LLM 从未被训练成能够推理、计划、思考或进行思考。
LLMs like ChatGPT were developed to function as chat completion models, where text content is fed into the model, whose responses align with completing that request. LLMs were never trained to reason, plan, think, or have thoughts.
然而,正如我们在上一节的例子中所展示的那样,LLM模型可以被引导去提取其普遍规律,并扩展到其初始设计之外。虽然LLM模型并非设计用于推理,但输入模型的训练材料能够帮助模型理解推理、规划和思考。因此,从这个意义上讲,LLM模型理解推理的本质,并能够运用推理的概念。
However, much like we demonstrated with the examples in the previous section, LLMs can be prompted to extract their generalities and be extended beyond their initial design. While an LLM isn’t designed to reason, the training material fed into the model provides an understanding of reasoning, planning, and thought. Therefore, by extension, an LLM understands what reasoning is and can employ the concept of reasoning.
我们将探讨另一组提示工程技术,这些技术能够模拟或模拟推理行为,从而展现这种推理能力。通常,在评估推理能力的应用时,我们会让LLM解决一些它原本设计之外的挑战性问题。这类问题通常基于逻辑、数学和文字题。
We’ll look at another set of prompt engineering techniques that allow or mimic reasoning behavior to demonstrate this reasoning ability. Typically, when evaluating the application of reasoning, we look to having the LLM solve challenging problems it wasn’t designed to solve. A good source of such is based on logic, math, and word problems.
以时间旅行为主题,还有什么比理解时间旅行更值得解决的独特问题呢?图 10.3 展示了一个极具挑战性的时间旅行问题示例。我们的目标是掌握如何引导逻辑推理模型(LLM)正确解决该问题。
Using the time travel theme, what class of unique problems could be better to solve than understanding time travel? Figure 10.3 depicts one example of a uniquely challenging time travel problem. Our goal is to acquire the ability to prompt the LLM in a manner that allows it to solve the problem correctly.
时间旅行问题是思维练习,但解决起来可能比想象中要困难得多。图 10.3 中的例子就很难解决。它虽然是法学硕士,但它出错的地方可能会让你感到惊讶。下一节将运用推理能力来解决这些独特的问题。
Time travel problems are thought exercises that can be deceptively difficult to solve. The example in figure 10.3 is complicated to solve for an LLM, but the part it gets wrong may surprise you. The next section will use reasoning in prompts to solve these unique problems.
思维链(CoT)提示是一种提示工程技术,它利用一次性或少数几个示例来描述推理过程以及实现预期目标的步骤。通过演示推理过程,LLM 可以概括这一原理,并针对类似的问题和目标进行推理。虽然 LLM 的训练目标并非推理,但我们可以通过提示工程来引导模型进行推理。
Chain of thought (CoT)prompting is a prompt engineering technique that employs the one-shot or few-shot examples that describe the reasoning and the steps to accomplish a desired goal. Through the demonstration of reasoning, the LLM can generalize this principle and reason through similar problems and goals. While the LLM isn’t trained with the goal of reasoning, we can elicit the model to reason, using prompt engineering.
在 VS Code 提示流程可视化编辑器中打开prompt_flow/chain-of-thought-prompting/flow.dag.yaml。如图 10.4 所示,此流程的元素很简单。它仅包含两个 LLM 模块,首先使用 CoT 提示来解决一个复杂问题;然后,第二个 LLM 提示评估答案。
Open prompt_flow/chain-of-thought-prompting/flow.dag.yaml in the VS Code prompt flow visual editor. The elements of this flow are simple, as shown in figure 10.4. With only two LLM blocks, the flow first uses a CoT prompt to solve a complex question; then, the second LLM prompt evaluates the answer.
清单 10.7 展示了更详细描述流程块及其输入/输出的 YAML 伪代码。本例中的默认问题陈述与图 10.3 中的不同。
Listing 10.7 shows the YAML pseudocode that describes the blocks and the inputs/outputs of the flow in more detail. The default problem statement in this example isn’t the same as in figure 10.3.
chain-of-thought-prompting流程 输入:
陈述:待解决的问题的陈述
预期:问题的预期解决方案
LLM:cot(用于解决问题的提示)
输入:语句
输出:根据问题陈述预测的答案
LLM:evaluate_answer(用于评估解决方案的提示)
输入:
语句:-> 输入语句
预期结果:-> input.expected
预测结果:-> output.cot
输出结果:问题解答程度的评分。
输出:
语句:-> 输入语句
预期结果:-> input.expected
预测结果:-> output.cot
评估分数:output.evaluate_answer
### 示例输出
{
"evaluation_score": "0.5", #1
“预期”:在最后一次跳跃之后,马克斯发现自己
他出生于1980年,今年75岁。 #2
“预测”:Max 的起始年份为 2300 年,
以40年为单位倒叙,每次停留5年。
每个阶段的行程如下:
\n\n- 从 2300 年到 2260 年:最大年龄为 25 + 5 = 30 岁。
- 从 2260 年到 2220 年:最大年龄为 30 + 5 = 35 岁。
- 从 2220 年到 2180 年:最大年龄为 35 + 5 = 40 岁。
\n- 从 2180 年到 2140 年:最大年龄为 40 + 5 = 45 岁。
- 从 2140 年到 2100 年:最大年龄为 45 + 5 = 50 岁。
- 从 2100 年到 2060 年:最大年龄为 50 + 5 = 55 岁。
\n- 从 2060 年到 2020 年:最大年龄为 55 + 5 = 60 岁。
\n- 从 2020 年到 1980 年:Max 的年龄为 60 + 5 = 65 岁。
\n- 从 1940 年到 1980 年:Max 的年龄是 65 + 5 = 70 岁。
\n- 1940 年至 1900 年:最大值为 70 + 5 英寸 #3
} Inputs:
statement : the statement problem to be solved
expected : the expected solution to the problem
LLM: cot (the prompt used to solve the problem)
inputs: statement
outputs: the predicted answer given the problem statement
LLM: evaluate_answer (the prompt used to evaluate the solution)
inputs:
statement: -> input.statement
expected: -> input.expected
predicted: -> output.cot
outputs: a score of how well the problem was answered
Outputs:
statement: -> input.statement
expected: -> input.expected
predicted: -> output.cot
evaluation_score: output.evaluate_answer
### Example Output
{
"evaluation_score": "0.5", #1
"expected": "After the final jump, Max finds himself
in the year 1980 and he is 75 years old.", #2
"predicted": " Max starts in the year 2300 and
travels backward in 40-year increments, spending 5 years
in each period. The journeys will be as follows:
\n\n- From 2300 to 2260: Max is 25 + 5 = 30 years old.
\n- From 2260 to 2220: Max is 30 + 5 = 35 years old.
\n- From 2220 to 2180: Max is 35 + 5 = 40 years old.
\n- From 2180 to 2140: Max is 40 + 5 = 45 years old.
\n- From 2140 to 2100: Max is 45 + 5 = 50 years old.
\n- From 2100 to 2060: Max is 50 + 5 = 55 years old.
\n- From 2060 to 2020: Max is 55 + 5 = 60 years old.
\n- From 2020 to 1980: Max is 60 + 5 = 65 years old.
\n- From 1980 to 1940: Max is 65 + 5 = 70 years old.
\n- From 1940 to 1900: Max is 70 + 5" #3
}
仔细研究输入数据并检查问题描述;尝试自行评估问题。然后,按 Shift-F5 运行流程。你应该看到与清单 10.7 中类似的输出。
Dig into the inputs and check the problem statement; try to evaluate the problem yourself. Then, run the flow by pressing Shift-F5. You should see output similar to that shown in listing 10.7.
打开cot.jinja2如清单 10.8 所示的提示文件。该提示文件提供了一些时间旅行问题的示例,以及经过深思熟虑和推理的解决方案。向学习逻辑模型 (LLM) 展示解决问题的步骤,有助于理解其推理机制。
Open the cot.jinja2 prompt file as shown in listing 10.8. This prompt gives a few examples of time travel problems and then the thought-out and reasoned solution. The process of showing the LLM the steps to complete the problem provides the reasoning mechanism.
cot.jinja2 系统:
在一部穿越时空的电影中,莎拉穿越回过去……
阻止历史性事件的发生。她来了。
活动开始前两天。经过一天的准备,
她试图改变这件事,但意识到自己已经……
实际上她提前了两年到达,而不是两天。然后
决定等待并沉浸在过去,直到事件发生。
原始日期。莎拉在过去待了多少天。
在活动当天之前?” #1
思路链: #2
初始假设:莎拉认为她会在活动开始前两天到达。
准备时间:准备工作耗时1天。
意识到错误:莎拉意识到她实际上早到了两年。
年与天的转换:
2 年 = 2 × 365 = 730 天(假设非闰年)。
调整准备工作所花费的天数:730 - 1 = 729 天。
结论:莎拉在事件发生当天之前,在过去度过了 729 天。
在一部科幻电影中,亚历克斯是一位时间旅行者,他决定
回到过去,亲眼见证一场著名的历史战役
那是100年前发生的一件事,持续了10天。
他于战斗开始前三天抵达。然而,
在过去待了六天后,他跳到了未来。
他花了50年时间才到达那里,并在那里待了20天。然后,他
回到过去见证战争的结束。有多少人
在看到结局之前,亚历克斯花了多少天时间沉浸在过去?
这场战斗? #3
思路链: #4
首次出行:亚历克斯在战斗开始前三天抵达。
时间跳跃前的时间:亚历克斯在过去待了六天。
战斗已经开始,并且已经持续了3天(自从他……以来)。
提前3天到达,现在已经待了6天(3+3=6)。
第一次时间跳跃:亚历克斯跳跃到50年后,并在那里待了20天。
这使他过去已经度过的6天又增加了20天。
(6 + 20 = 26)。
重返战场:当亚历克斯回归时,他回到了同一个战场。
根据时间旅行的逻辑,他离开的那一天,战斗已经持续了……
已经三天了。
等待战斗结束:这场战斗持续了10天。因为他
他已经经历了三天,还需要再等七天。
结论:Alex总共花费了3(初始等待时间)+ 3(之前)
第一次跳跃)+ 20(50年前)+ 7(返回后)= 33天
在他看到战斗结束之前就已过去。
逐步思考,但只给出最终答案。
用户:
{{陈述}} #5system:
"In a time travel movie, Sarah travels back in time to
prevent a historic event from happening. She arrives
2 days before the event. After spending a day preparing,
she attempts to change the event but realizes she has
actually arrived 2 years early, not 2 days. She then
decides to wait and live in the past until the event's
original date. How many days does Sarah spend in the past
before the day of the event?" #1
Chain of Thought: #2
Initial Assumption: Sarah thinks she has arrived 2 days before the event.
Time Spent on Preparation: 1 day spent preparing.
Realization of Error: Sarah realizes she's actually 2 years early.
Conversion of Years to Days:
2 years = 2 × 365 = 730 days (assuming non-leap years).
Adjust for the Day Spent Preparing: 730 - 1 = 729 days.
Conclusion: Sarah spends 729 days in the past before the day of the event.
"In a sci-fi film, Alex is a time traveler who decides
to go back in time to witness a famous historical battle
that took place 100 years ago, which lasted for 10 days.
He arrives three days before the battle starts. However,
after spending six days in the past, he jumps forward in
time by 50 years and stays there for 20 days. Then, he
travels back to witness the end of the battle. How many
days does Alex spend in the past before he sees the end of
the battle?" #3
Chain of Thought: #4
Initial Travel: Alex arrives three days before the battle starts.
Time Spent Before Time Jump: Alex spends six days in the past.
The battle has started and has been going on for 3 days (since he
arrived 3 days early and has now spent 6 days, 3 + 3 = 6).
First Time Jump: Alex jumps 50 years forward and stays for 20 days.
This adds 20 days to the 6 days he's already spent in the past
(6 + 20 = 26).
Return to the Battle: When Alex returns, he arrives back on the same
day he left (as per time travel logic). The battle has been going on for
3 days now.
Waiting for the Battle to End: The battle lasts 10 days. Since he's
already witnessed 3 days of it, he needs to wait for 7 more days.
Conclusion: Alex spends a total of 3 (initial wait) + 3 (before the
first jump) + 20 (50 years ago) + 7 (after returning) = 33 days in the
past before he sees the end of the battle.
Think step by step but only show the final answer to the statement.
user:
{{statement}} #5
您可能会注意到,图 10.3 的解答也作为示例列于清单 10.8 中。此外,回顾清单 10.7 中 LLM 对该问题的回复也很有帮助。从中,您可以了解 LLM 得出最终答案的推理步骤。
You may note that the solution to figure 10.3 is also provided as an example in listing 10.8. It’s also helpful to go back and review listing 10.7 for the reply from the LLM about the problem. From this, you can see the reasoning steps the LLM applied to get its final answer.
现在,我们可以看一下评估解决方案解决问题效果的提示。打开evaluate_answer.jinja2清单 10.9 中所示的提示,即可查看所使用的提示。该提示很简单,采用零样本提示,并允许 LLM 概括地确定如何对预期结果和预测结果进行评分。我们可以提供示例和评分,从而将其转换为少样本分类的示例。
Now, we can look at the prompt that evaluates how well the solution solved the problem. Open evaluate_answer.jinja2, shown in listing 10.9, to review the prompt used. The prompt is simple, uses zero-shot prompting, and allows the LLM to generalize how it should score the expected and predicted. We could provide examples and scores, thus changing this to an example of a few-shot classification.
evaluate_answer.jinja2系统:
请确认预期结果和预测结果是否一致。
对于给定的问题,情况也是如此。 #1
返回 0 到 1 之间的分数,其中 1 表示完全匹配,0 表示完全不匹配。
请只返回分数,不需要解释。 #2
用户:
问题:{{problem}} #3
预期结果:{{expected}} #4
预测结果:{{predicted}} #5system:
Please confirm that expected and predicted results are
the same for the given problem. #1
Return a score from 0 to 1 where 1 is a perfect match and 0 is no match.
Please just return the score and not the explanation. #2
user:
Problem: {{problem}} #3
Expected result: {{expected}} #4
Predicted result: {{predicted}} #5
查看前面清单 10.7 中所示的 LLM 输出,您就能明白为什么评估步骤可能会令人困惑。或许可以建议 LLM 用一个语句给出最终答案来解决这个问题。下一节,我们将继续讨论另一个提示推理的例子。
Looking at the LLM output shown earlier in listing 10.7, you can see why the evaluation step may get confusing. Perhaps a fix to this could be suggesting to the LLM to provide the final answer in a single statement. In the next section, we move on to another example of prompt reasoning.
正如我们的时间旅行所展示的,对于特定类型的问题,CoT提示在生成提示方面可能成本很高。虽然不如CoT有效,但存在一些与CoT类似的技术,它们不使用示例,并且更具普适性。本节将探讨一种用于LLM中引出推理的简单短语。
As our time travel demonstrates, CoT prompting can be expensive in terms of prompt generation for a specific class of problem. While not as effective, there are techniques similar to CoT that don’t use examples and can be more generalized. This section will examine a straightforward phrase employed to elicit reasoning in LLMs.
在 VS Code 的提示符流程可视化编辑器中打开prompt_flow/zero-shot-cot-prompting/flow.dag.yaml。此流程与之前的 CoT 非常相似,如图 10.4 所示。下一行显示了描述该流程的 YAML 伪代码。
Open prompt_flow/zero-shot-cot-prompting/flow.dag.yaml in the VS Code prompt flow visual editor. This flow is very similar to the previous CoT, as shown in figure 10.4. The next lsting shows the YAML pseudocode that describes the flow.
zero-shot-CoT-prompting流程 输入:
陈述:待解决的问题的陈述
预期:问题的预期解决方案
LLM:cot(用于解决问题的提示)
输入:语句
输出:根据问题陈述预测的答案
LLM:evaluate_answer(用于评估解决方案的提示)
输入:
语句:-> 输入语句
预期结果:-> input.expected
预测结果:-> output.cot
输出结果:问题解答程度的评分。
输出:
语句:-> 输入语句
预期结果:-> input.expected
预测结果:-> output.cot
评估分数:output.evaluate_answer
### 示例输出
{
"evaluation_score": "1", #1
“预期”:在最后一次跳跃之后,↪
↪马克斯发现自己身处1980年,
他今年75岁了。 #2
“预测”:“Max 开始时…… ↪
↪因此,在最后一次跳跃之后,↪
↪ Max 75 岁,年份是 1980 年。” #3
“声明”:“在复杂的时间旅行中……” #4
} Inputs:
statement : the statement problem to be solved
expected : the expected solution to the problem
LLM: cot (the prompt used to solve the problem)
inputs: statement
outputs: the predicted answer given the problem statement
LLM: evaluate_answer (the prompt used to evaluate the solution)
inputs:
statement: -> input.statement
expected: -> input.expected
predicted: -> output.cot
outputs: a score of how well the problem was answered
Outputs:
statement: -> input.statement
expected: -> input.expected
predicted: -> output.cot
evaluation_score: output.evaluate_answer
### Example Output
{
"evaluation_score": "1", #1
"expected": "After the final jump, ↪
↪ Max finds himself in the year 1980 and
he is 75 years old.", #2
"predicted": "Max starts in… ↪
↪ Therefore, after the final jump, ↪
↪ Max is 75 years old and in the year 1980.", #3
"statement": "In a complex time travel …" #4
}
在 VS Code 的可视化编辑器中按 Shift-F5 运行/测试流程。流程运行后,您应该会看到与清单 10.10 中类似的输出。此练习示例在处理同一问题时比之前的示例表现更好。
Run/test the flow in VS Code by pressing Shift-F5 while in the visual editor. The flow will run, and you should see output similar to that shown in listing 10.10. This exercise example performs better than the previous example on the same problem.
在 VS Code 中打开cot.jinja2提示符,如清单 10.11 所示。这个提示符比之前的示例简单得多,因为它只使用了零样本推理。然而,一个关键短语可以将这个简单的提示符变成一个强大的推理引擎。提示符中的这行代码Let’s think step by step会触发 LLM 考虑内部上下文进行推理。这反过来又会引导 LLM 分步解决问题。
Open the cot.jinja2 prompt in VS Code, as shown in listing 10.11. This is a much simpler prompt than the previous example because it only uses zero-shot. However, one key phrase turns this simple prompt into a powerful reasoning engine. The line in the prompt Let’s think step by step triggers the LLM to consider internal context showing reasoning. This, in turn, directs the LLM to reason out the problem in steps.
cot.jinja2系统:
你是解决时间旅行问题的专家。
你现在面临一个时间旅行问题,你需要解决它。
让我们一步一步来思考。 #1
请用一句话概括您的答案。 #2
用户:
{{陈述}} #3system:
You are an expert in solving time travel problems.
You are given a time travel problem and you have to solve it.
Let's think step by step. #1
Please finalize your answer in a single statement. #2
user:
{{statement}} #3
类似的语句,例如要求逻辑推理模型(LLM)思考步骤或分步骤作答,也能提取出推理过程。我们将在下一节演示一种类似但更复杂的技术。
Similar phrases asking the LLM to think about the steps or asking it to respond in steps also extract reasoning. We’ll demonstrate a similar but more elaborate technique in the next section.
我们可以将要求语言学习者逐步思考的行为扩展为一系列提示,迫使语言学习者分步骤解决问题。在本节中,我们将探讨一种称为提示链的技术,该技术可以强制语言学习者分步骤处理问题。
We can extend the behavior of asking an LLM to think step by step into a chain of prompts that force the LLM to solve the problem in steps. In this section, we look at a technique called prompt chaining that forces an LLM to process problems in steps.
prompt_flow/prompt-chaining/flow.dag.yaml如图 10.5 所示,在可视化编辑器中打开文件。提示链将解决问题的推理过程分解成一系列提示。这种技术迫使学习逻辑模型(LLM)分步骤地回答问题。
Open the prompt_flow/prompt-chaining/flow.dag.yaml file in the visual editor, as shown in figure 10.5. Prompt chaining breaks up the reasoning method used to solve a problem into chains of prompts. This technique forces the LLM to answer the problem in terms of steps.
清单 10.12 展示了更详细描述流程的 YAML 伪代码。该流程将第一个 LLM 块的输出链接到第二个 LLM 块,然后再将第二个 LLM 块的输出链接到第三个 LLM 块。强制 LLM 以这种方式处理问题可以揭示其推理模式,但同时也可能导致代码过于冗长。
Listing 10.12 shows the YAML pseudocode that describes the flow in a few more details. This flow chains the output of the first LLM block into the second and then from the second into the third. Forcing the LLM to process the problem this way uncovers the reasoning pattern, but it can also be overly verbose.
prompt-chaining流程 输入:
陈述:待解决的问题的陈述
LLM:分解步骤(用于分解问题的提示)
输入:
语句:-> 输入语句 #1
输出结果:解决问题的步骤分解
LLM:计算步数(用于计算步数的提示符)
输入:
语句:-> 输入语句
decompose_steps: -> output.decompose_steps #2
输出:每一步的计算结果
LLM:计算解(尝试解决问题)
输入:
语句:-> 输入语句
decompose_steps: -> output.decompose_steps
calculate_steps: -> output.calculate_steps #3
输出:最终解决方案声明
输出:
语句:-> 输入语句
decompose_steps: -> output.decompose_steps
calculate_steps: -> output.calculate_steps
calculate_solution: -> output.calculate_solution
### 示例输出
{
"calculate_steps": "1. Alex 花费的天数",
"decompose_steps": "要弄清楚……",“解决方案”:“在战斗结束前,亚历克斯在过去 待
了
13天。”#4
“声明”:“在一部科幻电影中,亚历克斯……”
} Inputs:
statement : the statement problem to be solved
LLM: decompose_steps (the prompt used to decompose the problem)
inputs:
statement: -> input.statement #1
outputs: the breakdown of steps to solve the problem
LLM: calculate_steps (the prompt used to calculate the steps)
inputs:
statement: -> input.statement
decompose_steps: -> output.decompose_steps #2
outputs: the calculation for each step
LLM: calculate_solution (attempts to solve the problem)
inputs:
statement: -> input.statement
decompose_steps: -> output.decompose_steps
calculate_steps: -> output.calculate_steps #3
outputs: the final solution statement
Outputs:
statement: -> input.statement
decompose_steps: -> output.decompose_steps
calculate_steps: -> output.calculate_steps
calculate_solution: -> output.calculate_solution
### Example Output
{
"calculate_steps": "1. The days spent by Alex",
"decompose_steps": "To figure out the …",
"solution": "Alex spends 13 days in the ↪
↪ past before the end of the battle.", #4
"statement": "In a sci-fi film, Alex …"
}
在可视化编辑器中按 Shift-F5 运行流程,您将看到如清单 10.12 所示的输出。虽然 Alex 问题的答案仍然不正确,但我们可以看到 LLM 为解决问题所做的所有推理工作。
Run the flow by pressing Shift-F5 from the visual editor, and you’ll see the output as shown in listing 10.12. The answer is still not correct for the Alex problem, but we can see all the work the LLM is doing to reason out the problem.
打开所有三个提示符:decompose_steps.jinja2,,calculate_steps.jinja2和calculate_solution.jinja2(分别参见清单 10.13、10.14 和 10.15)。可以比较清单中显示的这三个提示符,以了解输出是如何串联起来的。
Open up all three prompts: decompose_steps.jinja2, calculate_steps.jinja2, and calculate_solution.jinja2 (see listings 10.13, 10.14, and 10.15, respectively). All three prompts shown in the listings can be compared to show how outputs chain together.
decompose_steps.jinja2系统:
你是一个能够解决问题的人工智能助手。
你的工作是将用户的问题分解成更小的步骤并列出清单。
按解决顺序执行以下步骤。
要一步一步地思考,不要泛泛而谈。
不要尝试解决问题,只需列出步骤即可。#1
用户:
{{陈述}} #2system:
You are a problem solving AI assistant.
Your job is to break the users problem down into smaller steps and list
the steps in the order you would solve them.
Think step by step, not in generalities.
Do not attempt to solve the problem, just list the steps. #1
user:
{{statement}} #2
calculate_steps.jinja2系统:
你是一个能够解决问题的人工智能助手。
你会得到一份解决问题的步骤清单。
你的任务是按顺序计算每个步骤的输出结果。
不要试图解决整个问题,
只需列出每个步骤的输出结果即可。 #1
一步一步来。 #2
用户:
{{陈述}}
{{步骤}} #3system:
You are a problem solving AI assistant.
You will be given a list of steps that solve a problem.
Your job is to calculate the output for each of the steps in order.
Do not attempt to solve the whole problem,
just list output for each of the steps. #1
Think step by step. #2
user:
{{statement}}
{{steps}} #3
calculate_solution.jinja2系统:
你是一个能够解决问题的人工智能助手。
您将获得一份步骤列表以及每个步骤的计算结果。
利用每一步的计算结果来确定最终结果。
问题的解决方案。
只需提供问题的最终解决方案即可。
用一句简洁的句子。不要包含任何步骤。
在你的回答中。 #1
用户:
{{陈述}}
{{步骤}} #2
{{计算结果}} #3system:
You are a problem solving AI assistant.
You will be given a list of steps and the calculated output for each step.
Use the calculated output from each step to determine the final
solution to the problem.
Provide only the final solution to the problem in a
single concise sentence. Do not include any steps
in your answer. #1
user:
{{statement}}
{{steps}} #2
{{calculated}} #3
在这个练习示例中,我们没有进行任何评估和评分。没有评估,我们可以看到,这套提示序列在解决图 10.3 中展示的更具挑战性的时间旅行问题上仍然存在问题。然而,这并不意味着这种方法没有价值,这种提示方式确实能够很好地解决一些复杂的问题。
In this exercise example, we’re not performing any evaluation and scoring. Without the evaluation, we can see that this sequence of prompts still has problems solving our more challenging time travel problem shown earlier in figure 10.3. However, that doesn’t mean this technique doesn’t have value, and this prompting format solves some complex problems well.
然而,我们想要找到的是一种能够持续解决此类复杂问题的推理和规划方法。下一节将从推理入手,探讨如何评估最佳解决方案。
What we want to find, however, is a reasoning and planning methodology that can solve such complex problems consistently. The following section moves from reasoning to evaluating the best solution.
在前一节中,我们了解到,即使是最周全的计划也未必总能得出正确的解决方案。此外,我们可能也并非总能找到答案来验证该解决方案是否正确。事实上,我们常常需要采用某种评估方法来判断解决方案的有效性。
In the previous section, we learned that even the best-reasoned plans may not always derive the correct solution. Furthermore, we may not always have the answer to confirm if that solution is correct. The reality is that we often want to use some form of evaluation to determine the efficacy of a solution.
图 10.6 展示了为引导低学习模型 (LLM) 进行推理和规划而设计的各种提示工程策略的比较。我们已经介绍了左侧的两种策略:零样本直接提示和 CoT 提示。本节接下来的示例练习将探讨 CoT 和 ToT 技术的自洽性。
Figure 10.6 shows a comparison of the prompt engineering strategies that have been devised as a means of getting LLMs to reason and plan. We’ve already covered the two on the left: zero-shot direct prompting and CoT prompting. The following example exercises in this section will look at self-consistency with the CoT and ToT techniques.
我们将继续聚焦于复杂的时空旅行问题,比较这些在推理和规划基础上增加评估的更高级方法。下一节,我们将评估自洽性。
We’ll continue to focus on the complex time travel problem to compare these more advanced methods that expand on reasoning and planning with evaluation. In the next section, we’ll evaluate self-consistency.
提示的一致性不仅仅是降低发送给生命周期管理(LLM)的温度参数。通常,我们希望生成一致的计划或解决方案,同时仍然使用较高的温度来更好地评估计划的所有变体。通过评估多个不同的计划,我们可以更好地了解解决方案的整体价值。
Consistency in prompting is more than just lowering the temperature parameter we send to an LLM. Often, we want to generate a consistent plan or solution and still use a high temperature to better evaluate all the variations to a plan. By evaluating multiple different plans, we can get a better sense of the overall value of a solution.
自洽提示是一种针对特定问题生成多个方案/解决方案的技术。然后,对这些方案进行评估,并采纳出现频率更高或更一致的方案。例如,假设生成了三个方案,其中两个方案相似,而第三个方案则不同。利用自洽性,我们将前两个方案评估为更一致的答案。
Self-consistent prompting is the technique of generating multiple plans/solutions for a given problem. Then, those plans are evaluated, and the more frequent or consistent plan is accepted. Imagine three plans generated, where two are similar, but the third is different. Using self-consistency, we evaluate the first two plans as the more consistent answer.
在 VS Code 的提示流程可视化编辑器中打开prompt_flow/self-consistency-prompting/flow.dag.yaml。流程图 10.7 展示了提示生成流程的简洁性。图中紧邻的是自洽性评估流程。
Open prompt_flow/self-consistency-prompting/flow.dag.yaml in the VS Code prompt flow visual editor. The flow diagram shows the simplicity of the prompt generation flow in figure 10.7. Next to it in the diagram is the self-consistency evaluation flow.
Prompt Flow 使用有向无环图 (DAG) 格式来执行流程逻辑。DAG 是演示和执行流程逻辑的绝佳方式,但由于它是无环的(即不能重复),因此无法执行循环。然而,由于 Prompt Flow 提供了批处理机制,我们可以利用该机制来模拟流程中的循环或重复。
Prompt flow uses a direct acyclic graph (DAG) format to execute the flow logic. DAGs are an excellent way of demonstrating and executing flow logic, but because they are acyclic, meaning they can’t repeat, they can’t execute loops. However, because prompt flow provides a batch processing mechanism, we can use that to simulate loops or repetition in a flow.
如图 10.6 所示,自洽性算法在收集结果并确定最佳方案/回复之前,会对输入进行三次处理。我们可以采用相同的模式,但使用批处理来生成输出。然后,评估流程将汇总结果并确定最佳答案。
Referring to figure 10.6, we can see that self-consistency processes the input three times before collecting the results and determining the best plan/reply. We can apply this same pattern but use batch processing to generate the outputs. Then, the evaluation flow will aggregate the results and determine the best answer.
在 VS Code 中打开self-consistency-prompting/cot.jinja2提示模板(参见清单 10.16)。清单已简化,因为我们之前已经见过其中的一些部分。此提示使用两个(少样本提示)CoT 示例来向 LLM 演示思维推理过程。
Open the self-consistency-prompting/cot.jinja2 prompt template in VS Code (see listing 10.16). The listing was shortened, as we’ve seen parts before. This prompt uses two (few-shot prompt) examples of a CoT to demonstrate the thought reasoning to the LLM.
self-consistency-prompting/cot.jinja2系统:
“在一部穿越时空的电影里,莎拉穿越回了过去……” #1
思路链:
初始假设:…… #2
结论:莎拉在事件发生当天之前,在过去度过了 729 天。
“在一个情节复杂的穿越时空电影故事中,25岁的麦克斯……” #3
思路链:
起点:Max 开始… #4
结论:在最后一次跳跃之后,
马克斯发现自己身处 1980 年,他已经 75 岁了。
一步一步来,
但只显示该问题的最终答案。 #5
用户:
{{陈述}}system:
"In a time travel movie, Sarah travels back… " #1
Chain of Thought:
Initial Assumption: … #2
Conclusion: Sarah spends 729 days in the past before the day of the event.
"In a complex time travel movie plot, Max, a 25 year old…" #3
Chain of Thought:
Starting Point: Max starts … #4
Conclusion: After the final jump,
Max finds himself in the year 1980 and he is 75 years old.
Think step by step,
but only show the final answer to the statement. #5
user:
{{statement}}
在 VS Code 中打开self-consistency-prompting/flow.dag.yaml文件。在可视化编辑器中单击“批量运行”(烧杯图标),以批量模式运行示例。图 10.8 逐步展示了该过程:
Open the self-consistency-prompting/flow.dag.yaml file in VS Code. Run the example in batch mode by clicking Batch Run (the beaker icon) from the visual editor. Figure 10.8 shows the process step by step:
statements.jsonl。statements.jsonl. 提示: 如果您需要回顾该过程,请参阅第 9 章,其中更详细地介绍了该过程。
TIP If you need to review the process, refer to chapter 9, which covers this process in more detail.
清单 10.17 显示了以批处理模式执行流程后生成的 JSON 输出。该statements.jsonl文件包含五个相同的 Alex 时间旅行问题条目。使用相同的条目可以模拟提示信息在重复条目上执行五次。
Listing 10.17 shows the JSON output from executing the flow in batch mode. The statements.jsonl file has five identical Alex time travel problem entries. Using identical entries allows us to simulate the prompt executing five times on the duplicate entry.
self-consistency-prompting批处理执行输出{
"name": "self-consistency-prompting_default_20240203_100322_912000",
"created_on": "2024-02-03T10:22:30.028558",
“状态”: “已完成”
"display_name": "self-consistency-prompting_variant_0_202402031022",
描述:null,
“标签”:null,
“特性”: {
"flow_path": "…prompt_flow/self-consistency-prompting", #1
"output_path" : "…/.promptflow/.runs/self-
↪ consistency-prompting_default_20240203_100322_912000", #2
"system_metrics": {
"total_tokens": 4649,
"prompt_tokens": 3635,
"completion_tokens": 1014,
持续时间:30.033773
}
},
"flow_name": "自一致性提示",
"数据": "…/prompt_flow/self-consistency-prompting/
↪ statements.jsonl", #3
"output": "…/.promptflow/.runs/self-consistency- ↪
↪ prompting_default_20240203_100322_912000/flow_outputs"
}{
"name": "self-consistency-prompting_default_20240203_100322_912000",
"created_on": "2024-02-03T10:22:30.028558",
"status": "Completed",
"display_name": "self-consistency-prompting_variant_0_202402031022",
"description": null,
"tags": null,
"properties": {
"flow_path": "…prompt_flow/self-consistency-prompting", #1
"output_path": "…/.promptflow/.runs/self-
↪ consistency-prompting_default_20240203_100322_912000", #2
"system_metrics": {
"total_tokens": 4649,
"prompt_tokens": 3635,
"completion_tokens": 1014,
"duration": 30.033773
}
},
"flow_name": "self-consistency-prompting",
"data": "…/prompt_flow/self-consistency-prompting/
↪ statements.jsonl", #3
"output": "…/.promptflow/.runs/self-consistency-↪
↪ prompting_default_20240203_100322_912000/flow_outputs"
}
您可以按住 Ctrl 键并单击输出链接(如清单 10.17 中高亮显示)来查看生成的流程图。这将打开另一个 VS Code 实例,显示一个包含所有运行输出的文件夹。现在,我们想要检查最一致的答案。幸运的是,提示流程中的评估功能可以通过相似性匹配来帮助我们找到一致的答案。
You can view the flow produced by pressing the Ctrl key and clicking the output link, highlighted in listing 10.17. This will open another instance of VS Code, showing a folder with all the output from the run. We now want to check the most consistent answer. Fortunately, the evaluation feature in prompt flow can help us identify consistent answers using similarity matching.
在 VS Code 中打开self-consistency-evaluation/flow.dag.yaml(参见图 10.7)。此流程会嵌入预测答案,然后使用聚合方法确定最一致的答案。
Open self-consistency-evaluation/flow.dag.yaml in VS Code (see figure 10.7). This flow embeds the predicted answer and then uses an aggregation to determine the most consistent answer.
根据流程,consistency.py在 VS Code 中打开,如清单 10.18 所示。此工具函数的代码会计算所有答案对之间的余弦相似度。然后,它会找到最相似的答案,将其记录下来,并作为最终答案输出。
From the flow, open consistency.py in VS Code, as shown in listing 10.18. The code for this tool function calculates the cosine similarity for all pairs of answers. Then, it finds the most similar answer, logs it, and outputs that as the answer.
consistency.py来自 promptflow 导入工具
从 typing 导入 List
导入 numpy 库并将其命名为 np
from scipy.spatial.distance import cosine
@工具
def consistency(texts: List[str],
嵌入:List[List[float]]) -> str:
如果 len(embeddings) != len(texts):raise ValueError ("嵌入
的数量必须与文本的数量匹配。")
mean_embedding = np.mean(embeddings, axis=0) #1
相似度 = [1 - cosine(embedding, mean_embedding) ↪
↪ for embedding in embeddings] #2
most_similar_index = np.argmax(similarities) #3
from promptflow import log_metric
log_metric(key="highest_ranked_output", value=texts[most_similar_index]) #4
返回文本[最相似索引] #5from promptflow import tool
from typing import List
import numpy as np
from scipy.spatial.distance import cosine
@tool
def consistency(texts: List[str],
embeddings: List[List[float]]) -> str:
if len(embeddings) != len(texts):
raise ValueError("The number of embeddings ↪
↪ must match the number of texts.")
mean_embedding = np.mean(embeddings, axis=0) #1
similarities = [1 - cosine(embedding, mean_embedding) ↪
↪ for embedding in embeddings] #2
most_similar_index = np.argmax(similarities) #3
from promptflow import log_metric
log_metric(key="highest_ranked_output", value=texts[most_similar_index]) #4
return texts[most_similar_index] #5
我们还需要以批处理模式运行评估流程。self-consistency-evaluation/flow.dag.yaml在 VS Code 中打开流程并以批处理模式运行(烧杯图标)。然后,选择“现有运行”作为流程输入,并在提示时选择您刚刚执行的最后一个运行作为输入。
We need to run the evaluation flow in batch mode as well. Open self-consistency-evaluation/flow.dag.yaml in VS Code and run the flow in batch mode (beaker icon). Then, select Existing Run as the flow input, and when prompted, choose the top or the last run you just executed as input.
同样,流程处理完成后,您会看到类似清单 10.17 所示的输出。按住 Ctrl 键并单击输出文件夹链接,即可打开一个新的 VS Code 实例来查看结果。metric.json在 VS Code 中找到并打开该文件,如图 10.9 所示。
Again, after the flow completes processing, you’ll see an output like that shown in listing 10.17. Ctrl-click on the output folder link to open a new instance of VS Code showing the results. Locate and open the metric.json file in VS Code, as shown in figure 10.9.
metrics.json文件以及显示最相似答案的输出。图 10.9 中显示的答案对于本次运行仍然是错误的。您可以继续进行几次批量运行,或者增加每次运行的次数,然后评估流程,看看是否能得到更好的答案。这种方法通常对比较简单的问题更有帮助,但仍然表明您缺乏解决复杂问题的能力。
The answer shown in figure 10.9 is still incorrect for this run. You can continue a few more batch runs of the prompt and/or increase the number of runs in a batch and then evaluate flows to see if you get better answers. This technique is generally more helpful for more straightforward problems but still demonstrates an inability to reason out complex problems.
自我一致性采用反思性方法来评估最有可能出现的想法。然而,最有可能出现的想法并不总是最佳选择。因此,在下一节中,我们必须考虑一种更全面的方法。
Self-consistency uses a reflective approach to evaluate the most likely thought. However, the most likely thing is certainly not always the best. Therefore, we must consider a more comprehensive approach in the next section.
如前所述,如图 10.6 所示,ToT 提示法结合了自我评估和提示链技术。因此,它将计划流程分解为一系列提示,并在提示链的每个步骤中提供多次评估的机会。这样就形成了一个树状结构,可以按层级进行广度优先的执行和评估,也可以自上而下进行深度优先的执行和评估。
As mentioned earlier, ToT prompting, as shown in figure 10.6, combines self-evaluation and prompt chaining techniques. As such, it breaks down the sequence of planning into a chain of prompts, but at each step in the chain, it provides for multiple evaluations. This creates a tree that can be executed and evaluated at each level, breadth-first, or from top to bottom, depth-first.
图 10.10 展示了使用广度优先和深度优先算法执行决策树的区别。遗憾的是,由于提示流的 DAG 执行模式,我们无法快速实现深度优先算法,但广度优先算法可以很好地工作。
Figure 10.10 shows the difference between executing a tree using breadth-first or depth-first. Unfortunately, due to the DAG execution pattern of prompt flow, we can’t quickly implement the depth-first method, but breadth-first works just fine.
在 VS Code 中打开tree-of-thought-evaluation/flow.dag.yaml。流程图如图 10.11 所示。该流程的功能类似于广度优先的 ToT 模式——该流程将一系列提示串联起来,要求 LLM 在每个步骤中返回多个计划。
Open tree-of-thought-evaluation/flow.dag.yaml in VS Code. The visual of the flow is shown in figure 10.11. This flow functions like a breadth-first ToT pattern—the flow chains together a series of prompts asking the LLM to return multiple plans at each step.
由于流程采用广度优先的方式执行,因此每个节点的每一层输出都会被评估。流程中的每个节点都使用一对语义函数——一个用于生成答案,另一个用于评估答案。语义函数是一个自定义的 Python 流程块,它可以处理多个输入并生成多个输出。
Because the flow executes in a breadth-first style, each level output of the nodes is also evaluated. Each node in the flow uses a pair of semantic functions—one to generate the answer and the other to evaluate the answer. The semantic function is a custom Python flow block that processes multiple inputs and generates multiple outputs.
清单 10.19 展示了该semantic_function.py工具。此通用工具在本流程中的多个模块中重复使用。它还演示了如何将 SK 中的功能嵌入到提示流程中,以便直接使用。
Listing 10.19 shows the semantic_function.py tool. This general tool is reused for multiple blocks in this flow. It also demonstrates the embedding functionality from the SK for direct use within prompt flow.
semantic_function.py@工具
def my_python_tool(
输入:字符串,
input_node: int,
history: str,
semantic_function: str,
evaluation_function: str,
函数名:字符串,
技能名称:字符串,
max_tokens: int,
温度:浮点,
deployment_name: str,
连接:Union[OpenAIConnection,
AzureOpenAIConnection], #1
) -> str:
如果输入为 None 或 输入 == "": #2
返回 ””
内核 = sk.Kernel(log=sk.NullLogger())
# 省略了用于设置内核和LLM连接的代码
函数 = kernel.create_semantic_function(
语义函数,
函数名=函数名,
技能名称=技能名称,
max_tokens=max_tokens,
温度=温度,
top_p=0.5) #3
评估 = kernel.create_semantic_function(
评估函数,
函数名="评估",
技能名称=技能名称,
max_tokens=max_tokens,
温度=温度,
top_p=0.5) #4
async def main():
查询 = f"{历史}\n{输入}"
尝试:
eval = int((await evaluation.invoke_async(query)).result)
如果 eval > 25: #5
返回 await function.invoke_async(query) #6
除异常 e 外:
引发异常(“评估失败”,e)
尝试:
result = asyncio.run(main()).result
返回结果
除异常 e 外:
print(e)
返回 ””@tool
def my_python_tool(
input: str,
input_node: int,
history: str,
semantic_function: str,
evaluation_function: str,
function_name: str,
skill_name: str,
max_tokens: int,
temperature: float,
deployment_name: str,
connection: Union[OpenAIConnection,
AzureOpenAIConnection], #1
) -> str:
if input is None or input == "": #2
return ""
kernel = sk.Kernel(log=sk.NullLogger())
# code for setting up the kernel and LLM connection omitted
function = kernel.create_semantic_function(
semantic_function,
function_name=function_name,
skill_name=skill_name,
max_tokens=max_tokens,
temperature=temperature,
top_p=0.5) #3
evaluation = kernel.create_semantic_function(
evaluation_function,
function_name="Evaluation",
skill_name=skill_name,
max_tokens=max_tokens,
temperature=temperature,
top_p=0.5) #4
async def main():
query = f"{history}\n{input}"
try:
eval = int((await evaluation.invoke_async(query)).result)
if eval > 25: #5
return await function.invoke_async(query) #6
except Exception as e:
raise Exception("Evaluation failed", e)
try:
result = asyncio.run(main()).result
return result
except Exception as e:
print(e)
return ""
语义函数工具用于决策树的专家、节点和答案块中。在每个步骤中,该函数都会判断是否有文本输入。如果没有文本输入,则该块返回而不执行。如果某个块没有接收任何文本,则意味着前一个块的评估失败。通过在每个步骤之前进行评估,ToT 可以避免执行它认为无效的计划。
The semantic function tool is used in the tree’s experts, nodes, and answer blocks. At each step, the function determines if any text is being input. If there is no text, the block returns with no execution. Passing no text to a block means that the previous block failed evaluation. By evaluating before each step, ToT short-circuits the execution of plans it deems as not being valid.
这或许一开始难以理解,所以请先在 VS Code 中运行流程。清单 10.20 仅显示了运行结果中答案节点的输出;实际结果可能与您看到的有所不同,但应该类似。未返回任何文本的节点要么评估失败,要么其父节点评估失败。
This may be a complex pattern to grasp at first, so go ahead and run the flow in VS Code. Listing 10.20 shows just the answer node output of a run; these results may vary from what you see but should be similar. Nodes that return no text either failed evaluation or their parents did.
tree-of-thought-evaluation流输出{
"answer_1_1": "", #1
"answer_1_2": "",
"answer_1_3": "",
"answer_2_1": "Alex 在过去总共待了 29 天之后"
看到战斗结束。
"answer_2_2": "", #2
"answer_2_3": "Alex 在过去总共待了 29 天之后"
看到战斗结束。
"answer_3_1": "", #3
"answer_3_2": "Alex 在过去总共待了 29 天之后"
看到战斗结束。
"answer_3_3": "Alex 在过去总共待了 9 天之后..."
看到战斗结束。{
"answer_1_1": "", #1
"answer_1_2": "",
"answer_1_3": "",
"answer_2_1": "Alex spends a total of 29 days in the past before he
sees the end of the battle.",
"answer_2_2": "", #2
"answer_2_3": "Alex spends a total of 29 days in the past before he
sees the end of the battle.",
"answer_3_1": "", #3
"answer_3_2": "Alex spends a total of 29 days in the past before he
sees the end of the battle.",
"answer_3_3": "Alex spends a total of 9 days in the past before he
sees the end of the battle.",
清单 10.20 中的输出显示了仅对部分节点进行评估的情况。在大多数情况下,被评估的节点返回了可能有效的答案。如果没有产生任何输出,则表示该节点本身或其父节点无效。当所有同级节点都返回空值时,则表示父节点评估失败。
The output in listing 10.20 shows how only a select set of nodes was evaluated. In most cases, the evaluated nodes returned an answer that could be valid. Where no output was produced, it means that the node itself or its parent wasn’t valid. When sibling nodes all return empty, the parent node fails to evaluate.
正如我们所见,ToT 方法适用于复杂问题,但可能不太实用。执行此流程最多可能需要调用 LLM 27 次才能生成输出。实际上,它可能只需要一半的调用次数,但即便如此,解决一个问题仍然需要十几次或更多的调用。
As we can see, ToT is valid for complex problems but perhaps not very practical. The execution of this flow can take up to 27 calls to an LLM to generate an output. In practice, it may only do half that many calls, but that’s still a dozen or more calls to answer a single problem.
通过以下练习来加深你对相关内容的理解:
Use the following exercises to improve your knowledge of the material:
目标 ——为LLM创建三个不同的提示,以总结一篇最近的科学文章:一个使用直接提示,一个使用少量提示,最后一个使用零次提示。
Objective —Create three different prompts for an LLM to summarize a recent scientific article: one using direct prompting, one with few-shot prompting, and the last employing zero-shot prompting.
任务:
Tasks:
目标 ——设计一组提示,要求LLM解决逻辑谜题或谜语。
Objective —Design a set of prompts that require the LLM to solve logical puzzles or riddles.
任务:
Tasks:
目标 ——制定一个评估提示,要求LLM预测假设实验的结果。
Objective —Develop an evaluation prompt that asks the LLM to predict the outcome of a hypothetical experiment.
任务:
Task:
既然我们已经了解了大型语言模型(LLM)如何进行推理和规划,本章将进一步探讨这一概念,即在智能体框架内应用规划。规划应该是任何智能体/助手平台或工具包的核心。我们将首先了解规划的基础知识以及如何通过提示来实现规划器。然后,我们将使用 OpenAI Assistants 平台来了解规划是如何运作的,该平台会自动集成规划功能。最后,我们将构建并实现一个适用于大型语言模型的通用规划器。
Now that we’ve examined how large language models (LLMs) can reason and plan, this chapter takes this concept a step further by employing planning within an agent framework. Planning should be at the core of any agent/assistant platform or toolkit. We’ll start by looking at the basics of planning and how to implement a planner through prompting. Then, we’ll see how planning operates using the OpenAI Assistants platform, which automatically incorporates planning. From there, we’ll build and implement a general planner for LLMs.
计划固然重要,但反馈往往被忽视。因此,在本章的最后几节中,我们将探讨反馈机制并将其应用到计划工具中。您必须熟悉第十章的内容,如有需要,请先复习一下。准备就绪后,我们就可以开始计划了。
Planning can only go so far, and an often-unrecognized element is feedback. Therefore, in the last sections of the chapter, we explore feedback and implement it within a planner. You must be familiar with the content of chapter 10, so please review it if you need to, and when you’re ready, let’s begin planning.
无法进行规划、只能执行简单交互的智能体和助手,只不过是聊天机器人而已。正如本书所述,我们的目标并非构建机器人,而是构建能够自主思考的智能体——这些智能体能够接收目标,找出解决方案,并将结果返回给用户。
Agents and assistants who can’t plan and only follow simple interactions are nothing more than chatbots. As we’ve seen throughout this book, our goal isn’t to build bots but rather to build autonomous thinking agents—agents that can take a goal, work out how to solve it, and then return with the results.
图 11.1 展示了智能体/助手将要执行的总体规划流程。该图在第一章中也出现过,但现在我们来更详细地回顾一下。在图的顶部,用户提交一个目标。在智能体系统中,智能体接收目标,构建计划,执行计划,然后返回结果。
Figure 11.1 explains the overall planning process that the agent/assistant will undertake. This figure was also presented in chapter 1, but let’s review it now in more detail. At the top of the figure, a user submits a goal. In an agentic system, the agent takes the goal, constructs the plan, executes it, and then returns the results.
根据您与 ChatGPT、GPT、Claude 等平台的互动情况,您可能已经接触过规划助手,只是没有注意到而已。规划功能正变得越来越普遍,现在大多数商业平台都内置了规划功能,以使模型看起来更加智能和强大。因此,在接下来的练习中,我们将通过一个例子来设定基准,并区分无法进行规划的学习模型和能够进行规划的智能体。
Depending on your interaction with platforms such as ChatGPT and GPTs, Claude, and others, you may have already encountered a planning assistant and not even noticed. Planning is becoming ubiquitous and is now built into most commercial platforms to make the model appear more intelligent and capable. Therefore, in the next exercise, we’ll look at an example to set a baseline and differentiate between an LLM that can’t plan and an agent that can.
接下来,我们将使用 Nexus 来演示原始 LLM 如何无法独立进行规划。如果您在安装、设置和运行 Nexus 时需要帮助,请参阅第 7 章。安装并准备就绪后,我们可以使用 Gradio 界面运行 Nexus,命令如下所示。
For the next exercise, we’ll use Nexus to demonstrate how raw LLMs can’t plan independently. If you need assistance installing, setting up, and running Nexus, refer to chapter 7. After you have Nexus installed and ready, we can begin running it with the Gradio interface, using the commands shown next.
nexus 运行 gradio
nexus run gradio
Gradio 是一款优秀的 Web 界面工具,专为演示 Python 机器学习项目而设计。图 11.2 展示了 Gradio Nexus 界面以及创建代理和使用您选择的代理引擎(OpenAI、Azure 和 Groq)的过程。除非模型/服务器支持工具/动作的使用,否则您无法使用 LM Studio。Anthropic 的 Claude 模型支持内部规划,因此在本练习中,请避免使用此模型。
Gradio is an excellent web interface tool built to demonstrate Python machine learning projects. Figure 11.2 shows the Gradio Nexus interface and the process for creating an agent and using an agent engine (OpenAI, Azure, and Groq) of your choice. You can’t use LM Studio unless the model/server supports tool/action use. Anthropic’s Claude supports internal planning, so for the purposes of this exercise, avoid using this model.
创建智能体之后,我们需要赋予它特定的操作(工具)来执行或完成目标。通常来说,只提供智能体完成目标所需的操作是最佳选择,原因有以下几点:
After creating the agent, we want to give it specific actions (tools) to undertake or complete a goal. Generally, providing only the actions an agent needs to complete its goal is best for a few reasons:
警告: 在撰写本书以及花费大量时间与智能体交互和构建智能体的过程中,我遇到过多次智能体失控的情况,例如下载文件、在不预期的情况下编写和执行代码、不断在不同工具之间切换,甚至删除不应该删除的文件。观察智能体如何通过操作展现新的行为固然有趣,但稍有不慎就可能失控。
WARNING While writing this book and working with and building agents over many hours, I have encountered several instances of agents going rogue with actions, from downloading files to writing and executing code when not intended, continually iterating from tool to tool, and even deleting files they shouldn’t have. Watching an agent emerge new behaviors using actions can be fun, but things can quickly go astray.
在本练习中,我们将定义以下列表中描述的目标。
For this exercise, we’ll define the goal described in the following listing.
在维基百科上搜索关于{topic}的页面,下载并保存每个页面。
写入名为 Wikipedia_{topic}.txt 的文件Search Wikipedia for pages on {topic} and download each page and save it
to a file called Wikipedia_{topic}.txt
此目标将展示以下行动:
This goal will demonstrate the following actions:
search_wikipedia(topic)—搜索维基百科并返回给定搜索词的页面 ID。search_wikipedia(topic)—Searches Wikipedia and returns page IDs for the given search term. get_wikipedia_page(page_id)—根据页面 ID 下载页面内容。get_wikipedia_page(page_id)—Downloads the page content given the page ID. save_file—将内容保存到文件。save_file—Saves the content to a file. 如图 11.3 所示,设置代理的操作。同时,请确保将“规划器”设置为“无”。我们稍后会介绍如何设置和使用规划器。您无需点击“保存”按钮;界面会自动保存代理的更改。
Set the actions on the agent, as shown in figure 11.3. You’ll also want to make sure the Planner is set to None. We’ll look at setting up and using planners soon. You don’t have to click Save; the interface automatically saves an agent’s changes.
选择操作和计划器后,在列表 11.2 中输入目标。然后点击“创建新主题”以创建一个新的对话。在聊天输入框中输入您要搜索的主题,然后等待客服人员回复。以下是一个已填写主题的目标示例,但您可以根据需要使用任何主题:
After you choose the actions and planner, enter the goal in listing 11.2. Then click Create New Thread to instantiate a new conversation. Substitute the topic you want to search for in the chat input, and wait for the agent to respond. Here’s an example of the goal filled with the topic, but again, use any topic you like:
在维基百科上搜索有关卡尔加里的页面,下载并保存每个页面。 一个名为 Wikipedia_Calgary.txt 的文件。
Search Wikipedia for pages on Calgary and download each page and save it to a file called Wikipedia_Calgary.txt.
图 11.4 显示了将目标提交给普通智能体的结果。我们可以看到,智能体执行了搜索主题的工具/操作,但无法执行后续步骤。如果您还记得我们在第 5 章中关于操作的讨论和代码示例,OpenAI、Groq 和 Azure OpenAI 都支持并行操作,但不支持顺序操作或计划操作。
Figure 11.4 shows the results of submitting the goal to the plain agent. We see the agent executed the tool/action to search for the topic but couldn’t execute any steps beyond that. If you recall from our discussion and code example of actions in chapter 5, OpenAI, Groq, and Azure OpenAI all support parallel actions but not sequential or planned actions.
如果提交的目标包含多个并行任务/操作,LLM 可以给出较为准确的答案。但是,如果这些操作是顺序执行的,即一个步骤依赖于另一个步骤,则 LLM 将无法正常工作。请记住,并行操作是指可以与其他操作同时运行的独立操作。
The LLM can answer reasonably well if you submit a goal with several parallel tasks/actions. However, if the actions are sequential, requiring one step to be dependent on another, it will fail. Remember, parallel actions are standalone actions that can be run alongside others.
Anthropic 的 Claude 和 OpenAI Assistants 都支持顺序动作规划。这意味着这两个模型都可以接受顺序规划,它们会执行这些规划并返回结果。下一节我们将探讨顺序规划,并演示其实际应用。
Anthropic’s Claude and OpenAI Assistants support sequential action planning. This means both models can be called with sequential plans, and the model will execute them and return the results. In the next section, we’ll explore sequential planning and then demonstrate it in action.
在接下来的练习中,我们将使用 OpenAI 助手来解决相同的目标。如果您拥有 Anthropic/Claude 的账号凭据并且已配置好引擎,也可以使用该模型尝试完成此练习。
In the next exercise, we’ll ask an OpenAI assistant to solve the same goal. If you have Anthropic/Claude credentials and have the engine configured, you can also try this exercise with that model.
图 11.5 展示了按顺序执行任务(规划)和使用迭代执行任务之间的区别。如果您使用过 GPT、智能助手或 Claude Sonnet 3.5,您可能已经体验过这种区别。这些高级工具已经通过提示标注、高级训练或两者结合的方式实现了规划。
Figure 11.5 shows the difference between executing tasks sequentially (planning) and using iteration. If you’ve used GPTs, assistants, or Claude Sonnet 3.5, you’ve likely already experienced this difference. These advanced tools already incorporate planning by prompt annotations, advanced training, or combining both.
随着学习逻辑模型(LLM)和聊天服务的不断发展,大多数模型很可能都会原生支持某种形式的规划和工具使用。然而,目前大多数模型(包括 GPT-4o)仅支持操作/工具的使用。
As LLM and chat services evolve, most models will likely natively support some form of planning and tool use. However, most models, including GPT-4o, only support action/tool use today.
让我们打开 GPT Assistants Playground 来演示顺序规划的实际应用。如果需要帮助,请参阅第 6 章的设置指南。我们将使用相同的目标,但这次将使用内置规划功能的助手来运行。
Let’s open the GPT Assistants Playground to demonstrate sequential planning in action. If you need help, refer to the setup guide in chapter 6. We’ll use the same goal but, this time, run it against an assistant (which has built-in planning).
启动 Playground 后,创建一个新的助手,并为其分配 `<command>`、`<command>`search_ wikipedia, get_wikipedia_page和save_file`<command>` 操作。图 11.6 显示了向助手输入目标后的结果。如图所示,助手在后台完成了所有任务,并返回了用户最终请求的输出,从而实现了目标。
After you launch the Playground, create a new assistant, and assign it the search_ wikipedia, get_wikipedia_page, and save_file actions. Figure 11.6 shows the results of entering the goal to the assistant. As you can see, the assistant completed all the tasks behind the scenes and responded with the user’s final requested output, achieving the goal.
为了展示 OpenAI Assistant 规划器的有效性,我们在目标中添加了一个新任务:总结每一页的内容。这个新任务没有对应的功能/工具,但 Assistant 足够智能,能够利用其功能总结内容。您可以打开文件[root folder]assistants_working_folder/Wikipedia_{topic}.txt并查看内容,了解 Assistant 的输出结果。现在我们已经了解了 LLM 在没有规划器和规划的情况下是如何工作的,接下来我们将在下一节中创建规划器。
To demonstrate the effectiveness of the OpenAI Assistant’s planner, we added another task, summarizing each page, to the goal. The inserted task didn’t have a function/tool, but the assistant was savvy enough to use its ability to summarize the content. You can see the output of what the assistant produced by opening the [root folder]assistants_working_folder/Wikipedia_{topic}.txt file and reviewing the contents. Now that we understand how LLMs function without planners and planning, we can move on to creating our planners in the next section.
诸如 LangChain 和 Semantic Kernel (SK) 之类的 LLM 工具拥有许多采用不同策略的规划器。然而,编写我们自己的规划器相对容易,而且 Nexus 还支持插件式接口,允许您添加来自 LangChain 和 SK 等工具的其他规划器,或者您自己的衍生规划器。
LLM tools such as LangChain and Semantic Kernel (SK) have many planners using various strategies. However, writing our planner is relatively easy, and Nexus also supports a plugin-style interface allowing you to add other planners from tools such as LangChain and SK, or your derivatives.
规划器听起来可能很复杂,但通过结合规划和推理的快速工程策略,它们很容易实现。在第十章中,我们介绍了推理和制定计划的基础知识,现在我们可以充分利用这些技能了。
Planners may sound complicated, but they are easily implemented through prompt engineering strategies that incorporate planning and reasoning. In chapter 10, we covered the basics of reasoning and deriving plans, and now we can put those skills to good use.
清单 11.3 展示了一个基于 SK 的顺序规划器,并对其进行了扩展以包含迭代功能。类似清单中所示的提示注释规划器可以根据特定需求进行调整,也可以像清单中所示的规划器一样更加通用。此规划器使用 JSON,但规划器可以使用 LLM 可识别的任何格式,包括代码。
Listing 11.3 shows a sequential planner derived from the SK, which is extended to incorporate iteration. Prompt annotation planners like those shown in the listing can be adapted to fit specific needs or be more general like those shown. This planner uses JSON, but planners could use any format an LLM understands, including code.
basic_nexus_planner.py 你是Nexus的规划师。 #1
你的任务是逐步创建一个格式正确的 JSON 计划,
达到既定目标。
根据提供的[目标]创建子任务列表。
每个子任务必须来自 [可用功能] 列表。请勿
使用列表中未列出的任何函数。
根据描述和说明来决定使用哪些功能。
函数名称。
有时,函数可能需要参数。如有必要,请提供参数。
计划应尽可能简短。
你还会收到一份纠正性、建议性和认知性的清单。
之前的方案反馈将帮助您做出决定。
例如:
【特殊功能】 #2
for-each-前缀
描述:对列表中的每个项目执行一个函数
参数:
- 函数:要执行的函数
- list:要迭代的项目列表
- index:列表中当前项的参数名称
【可用功能】
获取笑话主题
描述:获取笑话主题列表([str])
邮件至
描述:将输入的文本通过电子邮件发送给收件人
参数:
- 文本:要发送到电子邮件的文本
- 收件人:收件人的电子邮件地址。可以有多个地址。
如果以“;”分隔,则包含在内。
总结
描述:汇总输入文本
参数:
- 文本:要概括的文本
开玩笑
描述:生成一个有趣的笑话
参数:
- 主题:用来创作笑话的主题
[目标]
获取一些笑话主题列表,并针对每个主题生成一个不同的笑话。
把这些笑话通过邮件发给朋友。
[输出]
{
“子任务”:[
{"function": "GetJokeTopics"},
{"function": "for-each",
"args": {
"列表": "output_GetJokeTopics",
"index": "topic",
“功能”:
{
功能: 玩笑,
"args": {"topic": "topic"}}}},
{
函数:EmailTo,
"args": {
"text": "for-each_output_GetJokeTopics"
收件人:"朋友"}}
]
}
# 还有 2 个例子,但未在此列表中列出
【特殊功能】 #3
对于每个
描述:对列表中的每个项目执行一个函数
参数:
- 函数:要执行的函数
- 迭代器:要迭代的项列表
- index:列表中当前项的参数名称
【可用功能】 #4
{{$available_functions}}
[目标]
{{$goal}} #5
请务必只使用可用函数列表中的函数。
计划应尽可能简短。
仅以 JSON 格式返回计划。
[输出] #6You are a planner for Nexus. #1
Your job is to create a properly formatted JSON plan step by step, to
satisfy the goal given.
Create a list of subtasks based off the [GOAL] provided.
Each subtask must be from within the [AVAILABLE FUNCTIONS] list. Do not
use any functions that are not in the list.
Base your decisions on which functions to use from the description and the
name of the function.
Sometimes, a function may take arguments. Provide them if necessary.
The plan should be as short as possible.
You will also be given a list of corrective, suggestive and epistemic
feedback from previous plans to help you make your decision.
For example:
[SPECIAL FUNCTIONS] #2
for-each- prefix
description: execute a function for each item in a list
args:
- function: the function to execute
- list: the list of items to iterate over
- index: the arg name for the current item in the list
[AVAILABLE FUNCTIONS]
GetJokeTopics
description: Get a list ([str]) of joke topics
EmailTo
description: email the input text to a recipient
args:
- text: the text to email
- recipient: the recipient's email address. Multiple addresses may be
included if separated by ';'.
Summarize
description: summarize input text
args:
- text: the text to summarize
Joke
description: Generate a funny joke
args:
- topic: the topic to generate a joke about
[GOAL]
"Get a list of joke topics and generate a different joke for each topic.
Email the jokes to a friend."
[OUTPUT]
{
"subtasks": [
{"function": "GetJokeTopics"},
{"function": "for-each",
"args": {
"list": "output_GetJokeTopics",
"index": "topic",
"function":
{
"function": "Joke",
"args": {"topic": "topic"}}}},
{
"function": "EmailTo",
"args": {
"text": "for-each_output_GetJokeTopics"
ecipient": "friend"}}
]
}
# 2 more examples are given but omitted from this listing
[SPECIAL FUNCTIONS] #3
for-each
description: execute a function for each item in a list
args:
- function: the function to execute
- iterator: the list of items to iterate over
- index: the arg name for the current item in the list
[AVAILABLE FUNCTIONS] #4
{{$available_functions}}
[GOAL]
{{$goal}} #5
Be sure to only use functions from the list of available functions.
The plan should be as short as possible.
And only return the plan in JSON format.
[OUTPUT] #6
图 11.7 展示了规划提示的构建和运行过程,从构建到执行,最终将结果返回给用户。规划器的工作原理是:构建规划提示,将其提交给 LLM 以构建计划,在本地解析和执行该计划,将结果返回给 LLM 进行评估和总结,最后将最终输出返回给用户。
Figure 11.7 shows the process of building and running a planning prompt, from building to execution to finally returning the results to the user. Planners work by building a planning prompt, submitting it to an LLM to construct the plan, parsing and executing the plan locally, returning the results to an LLM to evaluate and summarize, and finally returning the final output back to the user.
规划过程中的一些细微之处需要特别注意。通常情况下,规划是在独立环境中构建的,不会添加上下文历史记录。这样做是为了专注于目标,因为大多数规划提示都会消耗大量令牌。执行器中的函数通常在本地环境中执行,可能包括调用 API、执行代码,甚至运行机器学习模型。
It’s essential to notice a few subtle details about the planning process. Typically, the plan is built in isolation by not adding context history. This is done to focus on the goal because most planning prompts consume many tokens. Executing the functions within the executor is usually done in a local environment and may include calling APIs, executing code, or even running machine learning models.
create_plan清单 11.4 展示了该类中函数的代码BasicNexusPlanner;LangChain 和 SK 等工具也使用了类似的模式。该过程将代理的操作加载为字符串。然后,目标和可用函数列表会使用 `<template-permission>` 插入到规划器提示模板中PromptTemplateManager,`<template-permission>` 只是模板处理代码的一个包装器。模板处理可以使用简单的正则表达式完成,但也可以使用 Jinja2、Handlebars 或 Mustache 等工具来实现更复杂的功能。
Listing 11.4 shows the code for the create_plan function from the BasicNexusPlanner class; tools such as LangChain and SK use similar patterns. The process loads the agent’s actions as a string. The goal and available functions list are then inserted into the planner prompt template using the PromptTemplateManager, which is just a wrapper for the template-handling code. Template handling is done with simple regex but can also be more sophisticated using tools such as Jinja2, Handlebars, or Mustache.
basic_nexus_planner.py ( create_plan)def create_plan(self, nexus, agent, goal: str, prompt: str = PROMPT) -> Plan:
selected_actions = nexus.get_actions(agent.actions)
available_functions_string = "\n\n".join(
format_action(action) for action in selected_actions
) #1
context = {} #2
context["goal"] = 目标
context["available_functions"] = available_functions_string
ptm = PromptTemplateManager() #3
prompt = ptm.render_prompt(prompt, context)
plan_text = nexus.execute_prompt(agent, prompt) #4
返回计划(提示=提示,
目标=目标,
计划文本=计划文本) #5def create_plan(self, nexus, agent, goal: str, prompt: str = PROMPT) -> Plan:
selected_actions = nexus.get_actions(agent.actions)
available_functions_string = "\n\n".join(
format_action(action) for action in selected_actions
) #1
context = {} #2
context["goal"] = goal
context["available_functions"] = available_functions_string
ptm = PromptTemplateManager() #3
prompt = ptm.render_prompt(prompt, context)
plan_text = nexus.execute_prompt(agent, prompt) #4
return Plan(prompt=prompt,
goal=goal,
plan_text=plan_text) #5
清单 11.5 中所示的执行计划的代码解析 JSON 字符串并执行函数。执行计划时,代码会检测特定for-each函数,该函数遍历列表并执行每个元素。每次函数执行的结果都会添加到上下文中。此上下文会传递给每个函数调用,并作为最终输出返回。
The code to execute the plan, shown in listing 11.5, parses the JSON string and executes the functions. When executing the plan, the code detects the particular for-each function, which iterates through a list and executes each element in a function. The results of each function execution are added to the context. This context is passed to each function call and returned as the final output.
basic_nexus_planner.py ( execute_plan)def execute_plan(self, nexus, agent, plan: Plan) -> str:
context = {}
计划 = 计划.生成的计划
对于计划["子任务"]中的每个任务: #1
如果 task["function"] == "for-each": #2
list_name = task["args"]["list"]
index_name = task["args"]["index"]
inner_task = task["args"]["function"]
list_value = context.get(list_name, [])
对于 list_value 中的每个 item:
context[index_name] = 项目
result = nexus.execute_task(agent, inner_task, context)
context[f"for-each_{list_name}_{item}"] = result
for_each_output = [ #2
context[f"for-each_{list_name}_{item}"] ↪
对于 list_value 中的每个 item
]
context[f"for-each_{list_name}"] = for_each_output
对于 list_value 中的每个 item: #3
删除 context[f"for-each_{list_name}_{item}"]
别的:
result = nexus.execute_task(agent,
任务,
语境) #4
context[f"output_{task['function']}"] = result
返回上下文 #5def execute_plan(self, nexus, agent, plan: Plan) -> str:
context = {}
plan = plan.generated_plan
for task in plan["subtasks"]: #1
if task["function"] == "for-each": #2
list_name = task["args"]["list"]
index_name = task["args"]["index"]
inner_task = task["args"]["function"]
list_value = context.get(list_name, [])
for item in list_value:
context[index_name] = item
result = nexus.execute_task(agent, inner_task, context)
context[f"for-each_{list_name}_{item}"] = result
for_each_output = [ #2
context[f"for-each_{list_name}_{item}"] ↪
for item in list_value
]
context[f"for-each_{list_name}"] = for_each_output
for item in list_value: #3
del context[f"for-each_{list_name}_{item}"]
else:
result = nexus.execute_task(agent,
task,
context) #4
context[f"output_{task['function']}"] = result
return context #5
整个执行过程返回的上下文信息会通过最后一次调用发送给 LLM,LLM 会汇总结果并返回响应。如果一切按计划进行,LLM 将返回结果摘要。如果出现错误或缺少某些内容,LLM 可能会尝试修复问题或通知用户错误情况。
The returned context from the entire execution is sent in a final call to the LLM, which summarizes the results and returns a response. If everything goes as planned, the LLM will respond with a summary of the results. If there is an error or something is missing, the LLM may try to fix the problem or inform the user of the error.
现在我们再次打开 Nexus,测试一下规划器的运行情况。加载上次使用的同一个代理,但这次在“高级选项”中选择规划器,如图 11.8 所示。然后,像之前一样输入目标提示,让代理执行该任务。
Let’s now open Nexus again and test a planner in operation. Load up the same agent you used last time, but select the planner under the Advanced options this time, as shown in figure 11.8. Then, enter the goal prompt as you did before, and let the agent take it away.
几分钟后,代理程序会返回已保存的文件,在某些情况下,它还会提供额外信息,例如后续步骤以及如何处理输出结果。这是因为代理程序已获得其完成任务的概览。但请记住,计划执行是在本地层面进行的,只有上下文、计划和目标被发送到 LLM。
After a few minutes, the agent returns with the saved file, and in some cases, it may provide extra information, such as the next steps and what to do with the output. This is because the agent was given a high-level overview of what it accomplished. Remember, though, that plan execution is done at the local level, and only context, plan, and goal were sent to the LLM.
这意味着计划执行可以由任何进程完成,不一定非得由代理完成。在LLM之外执行计划可以减少代理执行计划所需的令牌和工具使用量。这也意味着LLM无需支持工具使用即可使用计划器。
This means that plan execution can be completed by any process, not necessarily by the agent. Executing a plan outside the LLM reduces the tokens and tool use the agent needs to perform. This also means that an LLM doesn’t need to support tools usage to use a planner.
在 Nexus 内部,当启用规划器时,代理引擎工具会被绕过。规划器会完成动作的执行,而代理仅通过传递输出上下文来感知这些动作。这对于支持工具使用但无法进行规划的模型来说是有利的。然而,对于既支持工具使用又支持规划的模型(例如 Claude),规划器可能会限制其功能。
Internally, when a planner is enabled within Nexus, the agent engine tool is bypassed. Instead, the planner completes the action execution, and the agent is only aware of the actions through the passing of the output context. This can be good for models that support tool use but can’t plan. However, a planner may limit functionality for models that support both tool use and planning, such as Claude.
一般来说,你需要了解你所使用LLM(生命周期管理)的功能。如果你不确定这些细节,也可以通过一些尝试来找到答案。让代理分别在启用和禁用计划功能的情况下完成一个多步骤目标,然后查看结果。
In general, you’ll want to understand the capabilities of the LLM you’re using. If you’re unsure of those details, then a little trial and error can also work. Ask the agent to complete a multistep goal with and without planning enabled, and then see the results.
规划使智能体能够完成多个顺序任务,从而实现更复杂的目标。外部规划或提示规划的问题在于它绕过了反馈迭代循环,而反馈迭代循环有助于快速纠正问题。正因如此,OpenAI 和其他公司现在正在将推理和规划直接集成到 LLM 层,我们将在下一节中看到这一点。
Planning allows agents to complete multiple sequential tasks to achieve more complex goals. The problem with external or prompt planning is that it bypasses the feedback iteration loop, which can help correct problems quickly. Because of this, OpenAI and others are now directly integrating reasoning and planning at the LLM level, as we’ll see in the next section.
代号为“草莓”(Strawberry)的 o1-preview 模型发布,标志着智能体系统可用的学习逻辑模型(LLM)类型发生了巨大转变。“草莓”不仅在数学、科学和一般计算任务方面效率更高,而且还能够在学习逻辑模型中直接进行推理、规划、评估和反馈。
The release of the o1-preview model, code named Strawberry, introduced a dramatic shift in the type of LLMs becoming available for agentic systems. Strawberry was not only proclaimed to be more efficient at math, science, and general calculation tasks but also able to engage in reasoning, planning, evaluation, and feedback directly in the LLM.
回顾一下我们在第十章中提到的时间旅行问题,如图 11.9 所示。如果你还记得,这个问题用 GPT-4 和其他类似的语言学习模型很难解决。但是,通过应用推理和反馈机制,我们最终能够得到一些偶尔正确的结果。
Consider our time travel problem from chapter 10 and shown again in figure 11.9. If you recall, this problem was difficult to solve using GPT-4 and other similar LLMs. However, with the application of reasoning and feedback, we were able to produce output that was occasionally correct.
作为一项实验,请使用 o1-preview 模型将此问题输入到 ChatGPT 中,如清单 11.6 所示。稍等片刻,等待答案。没错,模型仍然答错了。
As an experiment, enter this problem into ChatGPT using the o1-preview model, as shown in listing 11.6. Sit back for a few seconds and wait for the answer. Yep, the model still gets it wrong.
在一部科幻电影中,亚历克斯是一位时间旅行者,他决定回到过去。 亲眼见证100年前发生的一场著名历史战役, 战斗持续了10天。他在战斗开始前三天到达。 然而,在过去待了六天之后,他跳跃到了未来。 50年后,他在那里待了20天。然后,他返回了…… 见证战斗的结束。亚历克斯在过去待了多少天。 在他看到战斗结束之前?
In a sci-fi film, Alex is a time traveler who decides to go back in time to witness a famous historical battle that took place 100 years ago, which lasted for 10 days. He arrives three days before the battle starts. However, after spending six days in the past, he jumps forward in time by 50 years and stays there for 20 days. Then, he travels back to witness the end of the battle. How many days does Alex spend in the past before he sees the end of the battle?
虽然看到模型给出错误答案可能会有些令人失望,但它在分解问题和展示答案方面做得更好。清单 11.7 展示了将清单 11.6 中的问题交给 Strawberry 模型后的示例输出。请注意,由于 LLM 的随机性,您可能会得到不同的答案。
While it may be somewhat disappointing to see the model get the wrong answer, it does, however, do a far better job of breaking down the problem and demonstrating its answer. Listing 11.7 shows the sample output from posing the problem in listing 11.6 to the Strawberry model. Note, you may get a different answer because of the stochastic nature of the LLM.
为了确定亚历克斯在目睹……之前在过去待了多少天 战斗结束后,让我们一步步重构时间线: 抵达过去: 亚历克斯在战斗开始前3天到达。 他花了这三天时间等待战斗开始。 战斗期间: 战争开始了,亚历克斯经历了战争的前三天。 截至目前,他总共在过去待了6天(前3天)。 这场战斗以及战斗期间的3天)。 时间快进: #1 6天后,亚历克斯的时间跳跃到了50年后。 他在未来世界待了20天。 回到过去: 随后,亚历克斯回到过去,亲眼目睹了这场战斗的结局。 由于战斗持续了10天,而他在战斗的第3天就离开了, 这场战斗还剩7天。 然而,因为他离开的时间长达20天,这比…… 在剩余的7天战斗中,他必须回到之前的某个时间点。 战斗结束,我们得以见证它的结局。 为了目睹这场战斗的结局,他回到了过去,回到了最后一天。 战斗。 战斗结束前的最后几天: 返回后,他又在过去待了一天,亲眼目睹了那场战斗。 结尾。 在看到结局之前,过去总共花费了多少天: 时间跳跃前6天。 回到过去一天后。 总计:6 + 1 = 7 天。 答案:7
To determine how many days Alex spends in the past before witnessing the end of the battle, let's reconstruct the timeline step by step: Arrival in the Past: Alex arrives 3 days before the battle begins. He spends these 3 days waiting for the battle to start. During the Battle: The battle starts, and Alex experiences the first 3 days of it. At this point, he has spent a total of 6 days in the past (3 days before the battle and 3 days during the battle). Time Jump Forward: #1 After these 6 days, Alex jumps 50 years forward in time. He spends 20 days in this future time. Return to the Past: Alex then returns to the past to witness the end of the battle. Since the battle lasts 10 days and he left after day 3 of the battle, there are 7 days remaining in the battle. However, because he was away for 20 days, which is longer than the remaining 7 days of the battle, he must return to a point in time before the battle ends to witness its conclusion. To see the end of the battle, he returns to the past on the last day of the battle. Final Days Spent Before the Battle Ends: Upon returning, he spends 1 more day in the past to witness the battle's end. Total Days Spent in the Past Before Seeing the End: 6 days before jumping forward in time. 1 day after returning to the past. Total: 6 + 1 = 7 days. Answer: 7
因为我们知道正确答案是 27,所以我们知道 LLM 模型是错误的;但如果我们不知道正确答案,我们很容易就会认为整个计算过程和推理都是正确的。当我们移除 LLM 交互和智能体系统中的反馈机制时,就会出现这类问题。反馈机制可以引导模型进行自我修正。
Because we know the right answer is 27, we know the LLM is wrong, but if we didn’t, we could just as easily assume that the work and reasoning were all correct. Problems like this can happen when we remove feedback in LLM interactions and agentic systems. Feedback can guide the model to correct itself.
然而,如果我们不知道正确答案是 27(假设他没有花一整天时间观看战斗,则答案是 26),而只是假设 LLM 或智能体给出的答案是正确的,该怎么办呢?其实,我们可以通过几个简单的提示来解决这个问题,这些提示可以引导 LLM 进行推理和规划反馈。不过,这些技巧对于像 OpenAI Assistant 这样的 LLM 或封装器来说更为有效,因为它们在模型内部就提供了推理和规划功能。
However, what if we didn’t know the correct answer was 27 (26, if you assume he doesn’t spend the day to witness the battle) and assumed the LLM or agent was correct? Well, this is a problem we can rectify with a couple of simple prompts that can engage the LLM in reasoning and planning feedback. However, these techniques are more effective with LLMs or wrappers such as the OpenAI Assistants, which provide reasoning and planning within the model.
我们希望向LLM提供反馈,但理解反馈内容对我们来说可能比较困难。幸运的是,只要我们给出正确答案,就可以直接从LLM那里获得反馈。清单11.8展示了如何就我们的时间旅行问题从LLM获得建设性反馈。
What we want to do is provide feedback to the LLM, but understanding what that feedback is will likely be difficult for us. Fortunately, we can elicit feedback directly from the LLM, provided we give the correct answer. Listing 11.8 shows how to generate constructive feedback from the LLM concerning our time travel problem.
正确答案是 27,请检查你的错误之处并提出建议。 在尝试解决未来类似问题时,你可以给自己一些反馈。
the correct answer is 27, please review what you did wrong and suggest feedback you could give yourself when trying to solve similar future problems
在模型回答错误后输入此信息,即可生成反馈,您可以利用这些反馈通过提示或作为系统指令的一部分来指导模型。清单 11.9 展示了 o1-preview 提供的反馈示例。下次需要处理复杂的时间旅行问题时,您可以提取此反馈并完善指令。
Entering this after the model answers the question wrong will generate feedback that you can use to guide the model through prompting or as part of system instructions. Listing 11.9 shows an example of the feedback provided by o1-preview. You can then extract this feedback and augment the instructions the next time you want to tackle complex time travel problems.
对未来问题的反馈 理解时间旅行中的时间框架: 相对时间范围:始终考虑相对于……的时间段 原始参考点。在这个问题中,即使 Alex 跳跃 五十年过去了,相对于他最初的状态而言,他仍然停留在过去。 现在。 包括所有相关时期: 确保包含过去(或未来)的所有时间段。 在总计算中,即使它们涉及迁移到不同的位置,也应计入在内。 时间点。 时间连续性: 请记住,当旅行者穿越时空时,其他时间发生的事件也会随之改变。 除非另有规定,否则时间段将继续向前推进。 应用: 仔细阅读: 请仔细阅读所提供的详细信息,例如持续时间和 时间跳跃序列。 可视化: 绘制时间轴或许有助于形象化地展现事件顺序。 涉及的时间段。 复核计算结果: 解决问题后,请重新检查每个步骤,确保所有时间段都正确无误。 已正确核算。
Feedback for Future Problems Understanding Time Frames in Time Travel: Relative Time Frames: Always consider the time periods relative to the original point of reference. In this problem, even though Alex jumps forward 50 years, he is still in the past relative to his original present time. Including All Relevant Periods: Ensure that all time periods spent in the past (or future) are included in the total calculation, even if they involve moving to different points in time. Time Continuity: Remember that while the traveler moves through time, events in other time periods continue to progress unless specified otherwise. Application: Careful Reading: Pay close attention to the details provided, such as the durations and the sequences of time jumps. Visualization: It may help to draw a timeline to visualize the sequence of events and time periods involved. Double-Checking Calculations: After solving, revisit each step to ensure all time periods are accounted for correctly.
这种反馈技术对诸如 o1-preview 之类的模型始终有效,但其他模型即使有了这种反馈,可能仍然难以给出正确答案。随着时间的推移,随着模型变得越来越智能,这种技术很可能适用于大多数模型。然而,即使模型变得越来越智能,这种反馈机制也可能仍然至关重要,因为语言是微妙的,我们提出的挑战语言学习模型(LLM)的问题并非都有显而易见的绝对答案。以我们给出的示例问题为例。这个问题就是一个很好的例子,它要求解题者做出假设并从问题中得出关联。在科学领域,从地质学到行为科学,仍然存在许多领域,同一个问题的答案可能多种多样。接下来,让我们看看如何将推理、规划、评估和反馈应用于智能体系统。
This feedback technique will consistently work on models such as o1-preview, but other models may still struggle to answer correctly, even given this feedback. Over time, as models become smarter, this technique will likely generally work on most models. However, this feedback mechanism will likely be essential even as models get progressively brighter. because language is nuanced, and not every problem we challenge LLMs with may have an obvious absolute answer. Take our example problem, for instance. This problem is an excellent example of requiring the problem solver to make assumptions and draw correlations from the question. There are still plenty of areas in science, from geology to behavioral science, where answering the same problem may yield a range of answers. Let’s look next at a few techniques for how the application of reasoning, planning, evaluation, and feedback can be applied to agentic systems.
在前几章中,我们探讨了如何实现规划、推理、反馈和评估等智能体组成部分。现在,我们将探讨如何、何时以及在何处将这些组成部分集成到用于实时生产、研究或开发的辅助和智能体系统中。
In recent chapters, we’ve examined how the agentic components of planning, reasoning, feedback, and evaluation can be implemented. Now we look at how, when, and where those components can be integrated into assistant and agentic systems for real-time production, research, or development.
虽然并非所有组件都适用于所有应用场景,但了解何时何地应用哪个组件仍然十分有用。下一节,我们将探讨如何将规划功能集成到助手/代理系统中。
While not all of these components may fit the same into every application, it’s useful to understand where and when to apply which component. In the next section, we look at how planning can be integrated into assistant/agentic systems.
规划是指助手或代理能够计划执行一系列任务的组件,无论这些任务是串行、并行还是其他组合。我们通常将规划与工具的使用联系起来,而且理所当然地,任何使用工具的系统都需要一个功能强大的规划器。然而,并非所有系统都具有相同的功能,因此在表 11.1 中,我们将回顾在何处、何时以及如何实施规划器。
Planning is the component where an assistant or agent can plan to undertake a set of tasks, whether they are in series, parallel, or some other combination. We typically associate planning with tool use, and, rightfully, any system using tools will likely want a capable planner. However, not all systems are created equally, so in table 11.1, we’ll review where, when, and how to implement planners.
|
应用
|
实施的
|
环境
|
目的
|
定时
|
配置
|
|---|---|---|---|---|---|
| 私人助理 |
在法学硕士课程期间或课程中 |
个人设备 |
促进工具使用 |
在响应过程中 |
作为提示或法学硕士的一部分 |
| 客服机器人 |
不常见;受限环境 |
受限环境,禁止使用工具。 |
|||
| 自主代理 |
作为代理提示的一部分,并在LLM框架内 |
服务器或服务 |
促进复杂工具的使用和任务规划 |
作为构建代理的一部分和/或响应期间 |
在代理人或LLM内部 |
| 协作工作流程 |
作为法学硕士课程的一部分 |
共享画布或编码 |
促进复杂工具的使用 |
在响应过程中 |
在法学硕士课程中 |
| 游戏AI |
作为法学硕士课程的一部分 |
服务器或应用程序 |
复杂工具的使用和规划 |
在反应之前或期间 |
在法学硕士课程中 |
| 研究 |
任何地方 |
服务器 |
促进工具使用并参与复杂的任务工作流程 |
在反应生成之前、期间和之后 |
任何地方 |
表 11.1 展示了几个不同的应用场景,在这些场景中,我们可能会部署助手或代理来提供某种程度的帮助。为了提供更多信息和指导,该列表详细说明了如何在每种应用中运用规划:
Table 11.1 shows several varied application scenarios in which we may find an assistant or agent deployed to assist in some capacity. To provide further information and guidance, this list provides more details about how planning may be employed in each application:
如您所见,规划是许多LLM应用的关键组成部分,无论是通过协调工具的使用,还是其他方式。下一节,我们将探讨推理的下一个组成部分,以及如何将其应用于同一应用栈。
As you can see, planning is an essential part of many LLM applications, whether through coordination of tool use or otherwise. In the next section, we look at the next component of reasoning and how it can be applied to the same application stack.
推理虽然通常与计划和任务完成密切相关,但它本身也可以是一个独立的组成部分。随着逻辑逻辑模型(LLM)的成熟和智能化,推理功能通常会被集成到LLM本身。然而,并非所有应用都能从全面的推理中获益,因为它往往会在LLM响应中引入一个思维循环。表11.2概述了如何将推理组件集成到各种LLM应用类型中。
Reasoning, while often strongly associated with planning and task completion, is a component that can also stand by itself. As LLMs mature and get smarter, reasoning is often included within the LLM itself. However, not all applications may benefit from extensive reasoning, as it often introduces a thinking cycle within the LLM response. Table 11.2 describes at a high level how the reasoning component can be integrated with various LLM application types.
|
应用
|
实施的
|
环境
|
目的
|
定时
|
配置
|
|---|---|---|---|---|---|
| 私人助理 |
在法学硕士课程中 |
个人设备 |
将工作分解成若干步骤 |
在响应过程中 |
作为提示或法学硕士的一部分 |
| 客服机器人 |
不常见;通常只是提供信息。 |
工具使用受限,需要使用复合工具 |
|||
| 自主代理 |
作为代理提示的一部分,并在LLM框架内 |
服务器或服务 |
促进复杂工具的使用和任务规划 |
作为法学硕士课程的一部分,外部推理并不适用 |
在代理人或LLM内部 |
| 协作工作流程 |
作为法学硕士课程的一部分 |
共享画布或编码 |
协助分解工作 |
在响应过程中 |
在法学硕士课程中 |
| 游戏AI |
作为法学硕士课程的一部分 |
服务器或应用程序 |
对于开展复杂行动至关重要 |
在反应之前或期间 |
在法学硕士课程中 |
| 研究 |
任何地方 |
服务器 |
了解如何解决复杂问题并参与复杂的任务工作流程 |
在反应生成之前、期间和之后 |
任何地方 |
表 11.2 展示了几个不同的应用场景,在这些场景中,我们可能会部署助手或代理来提供某种程度的帮助。为了提供更多信息和指导,该列表详细说明了如何在每种应用中运用推理:
Table 11.2 shows several varied application scenarios in which we may find an assistant or agent deployed to assist in some capacity. To provide further information and guidance, this list provides more details about how reasoning may be employed in each application:
虽然我们通常将推理与规划结合起来考虑,但在某些情况下,二者的实施层级可能有所不同。下一节我们将探讨评估各种应用程序的智能体支柱。
While we often consider reasoning in tandem with planning, there may be conditions where the level at which each is implemented may differ. In the next section we consider the agent pillar of evaluation of various applications.
评估是智能体/辅助系统的重要组成部分,它能够指导系统的性能表现。虽然我们已展示了如何在一些智能体工作流程中集成评估,但评估通常是智能体系统的外部组件。然而,它也是大多数学习领导力管理(LLM)应用的核心组件,在大多数开发过程中都不应被忽视。表 11.3 概述了如何将评估组件集成到各种 LLM 应用类型中。
Evaluation is the component of agentic/assistant systems that can guide how well the system performs. While we demonstrated incorporating evaluation in some agentic workflows, evaluation is often an external component in agentic systems. However, it’s also a core component of most LLM applications and not something that should be overlooked in most developments. Table 11.3 describes at a high level how the evaluation component can be integrated with various LLM application types.
|
应用
|
实施的
|
环境
|
目的
|
定时
|
配置
|
|---|---|---|---|---|---|
| 私人助理 |
外部的 |
服务器 |
评估系统运行状况。 |
互动之后 |
通常由外部开发 |
| 客服机器人 |
外接显示器 |
服务器 |
评估每次互动的成功率 |
互动之后 |
代理系统外部 |
| 自主代理 |
外部或内部 |
服务器或服务 |
在任务完成后或任务完成过程中评估系统的成功程度 |
互动之后 |
外部或内部 |
| 协作工作流程 |
外部的 |
共享画布或编码 |
评估合作的成功程度 |
互动之后 |
外部服务 |
| 游戏AI |
外部或内部 |
服务器或应用程序 |
评估代理人或评估策略或行动的成功程度 |
互动之后 |
外部人员或作为代理人或其他代理人的一部分 |
| 研究 |
结合手册和LLM |
服务器和人 |
评估已开展的研究成果 |
生成输出后 |
这取决于问题的复杂程度和所进行的研究。 |
表 11.3 展示了几个不同的应用场景,在这些场景中,我们可能会部署助手或代理来提供某种程度的帮助。为了提供更多信息和指导,该列表详细说明了如何在每种应用中进行评估:
Table 11.3 shows several varied application scenarios in which we may find an assistant or agent deployed to assist in some capacity. To provide further information and guidance, this list provides more details about how evaluation may be employed in each application:
评估是任何智能体或助手系统不可或缺的组成部分,尤其当该系统向用户提供真实且基础的信息时更是如此。为智能体和助手开发评估系统或许足以单独成书。在本章最后一节中,我们将探讨各种学习生命周期管理(LLM)应用中的反馈实现。
Evaluation is an essential element to any agentic or assistant system, especially if that system provides real and fundamental information to users. Developing evaluation systems for agents and assistants is likely something that could or should have its own book. In the final section of this chapter, we’ll look at feedback implementation for various LLM applications.
作为智能体系统的一个组成部分,反馈通常(即便并非总是)以外部组件的形式实现——至少目前如此。或许随着人们对评估系统的信心不断增强,反馈机制能够被常规地整合到此类系统中。表 11.4 展示了如何在各种学习管理系统(LLM)应用中实现反馈。
Feedback as a component of agentic systems is often, if not always, implemented as an external component—at least for now. Perhaps confidence in evaluation systems may improve to the point where feedback is regularly incorporated into such systems. Table 11.4 showcases how feedback can be implemented into various LLM applications.
|
应用
|
实施的
|
环境
|
目的
|
定时
|
配置
|
|---|---|---|---|---|---|
| 私人助理 |
外部或由用户 |
聚合到服务器或作为系统的一部分 |
提供系统改进的手段 |
互动之后或互动过程中 |
内部和外部 |
| 客服机器人 |
外接显示器 |
聚合到服务器 |
具备资格并提供系统改进的途径 |
互动之后 |
代理系统外部 |
| 自主代理 |
外部的 |
在服务器端聚合 |
为系统改进提供途径 |
互动之后 |
外部的 |
| 协作工作流程 |
在互动过程中 |
共享画布或编码 |
提供即时反馈机制 |
在互动过程中 |
外部服务 |
| 游戏AI |
外部或内部 |
服务器或应用程序 |
作为内部评估反馈的一部分,为动态改进提供了支持。 |
互动之后或互动过程中 |
外部人员或作为代理人或其他代理人的一部分 |
| 研究 |
结合手册和LLM |
服务器和人 |
评估已开展的研究成果 |
生成输出后 |
这取决于问题的复杂程度和所进行的研究。 |
表 11.4 列出了几个应用场景,在这些场景中,我们可能会部署助手或代理来提供某种程度的帮助。为了提供更多信息和指导,该列表详细说明了如何在每种应用中使用反馈:
Table 11.4 shows several application scenarios in which we may find an assistant or agent deployed to assist in some capacity. To provide further information and guidance, this list provides more details about how feedback may be employed in each application:
反馈是智能体和辅助系统的另一个重要组成部分,但并非在首次发布时就必须提供。然而,从长远来看,引入严格的反馈和评估机制能够极大地促进智能体系统的持续监控,并增强其改进系统各个方面的信心。
Feedback is another powerful component of agentic and assistant systems, but it’s not always required on the first release. However, incorporating rigorous feedback and evaluation mechanisms can greatly benefit agentic systems in the long term concerning ongoing monitoring and providing the confidence to improve various aspects of the system.
在您的代理系统中,如何实现这些组件,部分取决于您所选代理平台的架构。现在您已经了解了每个组件的细微差别,也就具备了选择适合您的应用和业务用例的代理系统的知识。无论您的应用是什么,几乎在所有情况下,您都需要使用多个代理组件。
How you implement each of these components in your agentic systems may, in part, be guided by the architecture of your chosen agentic platform. Now that you understand the nuances of each component, you also have the knowledge to guide you in selecting the right agent system that fits your application and business use case. Regardless of your application, you’ll want to employ several agentic components in almost all cases.
随着智能体系统的成熟和生命周期模型(LLM)本身的智能化程度不断提高,我们今天视为外部的一些组件可能会被紧密集成。我们已经看到推理和规划功能被集成到诸如Strawberry这样的模型中。可以肯定的是,随着我们逐渐接近理论上的通用人工智能里程碑,我们或许会看到能够进行长期自我评估和反馈的模型。
As agentic systems mature and LLMs themselves get smarter, some of the components we today consider external may be closely integrated. We’ve already seen reasoning and planning be integrated into a model such as Strawberry. Certainly, as we approach the theoretical artificial general intelligence milestone, we may see models capable of performing long-term self-evaluation and feedback.
总之,我希望你们和我一起享受了这段探索新兴技术的奇妙旅程,这项技术必将改变我们对工作的看法以及我们通过代理人开展工作的方式。
In any case, I hope you enjoyed this journey with me into this incredible frontier of a new and emerging technology that will certainly alter our perception of work and how we undertake it through agents.
通过以下练习来加深你对相关内容的理解:
Use the following exercises to improve your knowledge of the material:
目标 ——学习如何使用提示来生成一系列操作,从而实现一个基本的规划代理。
Objective —Learn how to implement a basic planning agent using a prompt to generate a sequence of actions.
任务:
Tasks:
目标 ——了解反馈机制如何提高智能体系统的性能。
Objective —Understand how feedback mechanisms can improve the performance of an agentic system.
任务:
Tasks:
目标——了解并行动作和顺序动作之间的区别,以及它们如何影响智能体的行为。
Objective—Learn the difference between parallel and sequential actions and how they affect agent behavior.
任务:
Tasks:
目标 ——学习如何构建自定义规划器并将其集成到代理平台中。
Objective —Learn how to build a custom planner and integrate it into an agent platform.
任务:
Tasks:
目标 ——学习如何在智能体系统中实现错误处理和反馈,以改进顺序规划。
Objective —Learn how to implement error handling and feedback to refine sequential planning in an agentic system.
任务:
Tasks:
尽管市面上有很多商业化的大型语言模型 (LLM) 服务,但本书建议直接使用 OpenAI 服务或通过 Azure OpenAI Studio 使用。要访问这些服务,您必须创建一个帐户并注册一种付款方式(本附录中未提及)。GPT-4 系列 LLM 被认为是同类最佳,更适合用于智能体开发。使用开源或其他服务始终是一种选择,但通常建议在您使用 GPT-4 一段时间后再考虑。
Although several commercial large language model (LLM) services are available, this book recommends using OpenAI services directly or through Azure OpenAI Studio. To access either service, you must create an account and register a payment method not covered in this appendix. The GPT-4 family of LLMs is considered best in class and better suited for agent development. Using open source and alternative services is always an option but generally only advisable after you’ve worked with GPT-4 for some time.
以下几个通用步骤可以帮助您快速设置,以便使用 OpenAI LLM 进行智能体开发。虽然使用 OpenAI 和其他商业 LLM 需要付费,但完成本书中的所有练习预计花费不到 100 美元:
The following general steps can help you quickly set up using OpenAI LLMs for agent development. Though using OpenAI and other commercial LLMs comes at a price, you can expect to pay less than US$100 to complete all the exercises in this book:
生成密钥后,您可以继续在.env配置文件中使用它,或者通过其他方式注册 OpenAI 密钥。本书中使用的大多数软件包通常只需密钥即可配置 OpenAI。其他服务(例如 Azure OpenAI)则需要配置模型部署和基本 URL,这将在下一节中介绍。
After generating a key, you can continue to use it within an .env configuration file or through other means of registering an OpenAI key. For most of the packages used in this book, configuring OpenAI will generally only require the key. Other services, such as Azure OpenAI, will require the configuration of a model deployment and a base URL as covered in the next section.
凭借与 OpenAI 的长期合作关系,微软在 Azure OpenAI Studio 中以相同的价格提供相同的模型。虽然 Azure 上的模型版本偶尔可能会稍有延迟,但微软通常会保持与 OpenAI 最新模型同步。
Through its ongoing relationship with OpenAI, Microsoft hosts the same models at the same price within Azure OpenAI Studio. Occasionally, Azure may be a model version behind, but Microsoft generally keeps current with the latest OpenAI models.
这些指南将更具通用性,因为访问 Azure 的方式有多种,创建帐户和访问工作室的方法也有多种(有关具体说明,请参阅 Microsoft 文档):
These guidelines will be more general because there are several ways to access Azure and methods of creating accounts and accessing the studio (for specific instructions, refer to Microsoft documentation):
在 Azure OpenAI 中,模型通过称为部署的资源分配方式公开。部署封装了模型(例如 GPT-4),并提供对该资源的访问。图 A.5 展示了通过部署公开的各种模型的示例。
Within Azure OpenAI, models are exposed through a resource allocation called a deployment. Deployments wrap a model, such as GPT-4, and provide access to the resource. Figure A.5 shows an example of various models being exposed through deployments.
同样,如果您遇到问题,微软文档可以为您提供正确的指导。使用 OpenAI 工具连接到 Azure OpenAI Studio 或其他 LLM 等资源时,需要记住以下三个关键区别:
Again, if you get stuck, the Microsoft documentation can guide you in the right direction. The three critical differences to remember when connecting to a resource such as Azure OpenAI Studio or another LLM using the OpenAI tooling are listed here:
api key访问该模型api key to access the model url或位置endpointurl or endpoint where the model is located model或名称deploymentmodel or deployment name 如果由于某种原因无法访问现有模型,开源模型是一个不错的替代方案。第二章将介绍如何设置和使用开源LLM模型。
If you can’t access a model for whatever reason, a good alternative is open source models. Setting up and consuming open source LLMs is covered in chapter 2.
本书假定读者是经验丰富的 Python 开发人员,但这可以有多种不同的定义。在本附录中,我们将探讨如何配置一个能够与本书代码示例兼容的 Python 开发环境。您可以使用其他集成开发环境 (IDE),但并非所有工具(尤其是扩展程序)都能在所有 IDE 中正常运行。
While this book assumes readers are experienced Python developers, this could mean many different things. In this appendix, we look at configuring a Python development environment that will function with the code examples in this book. You can use other integrated development environments (IDEs), but not all tooling, especially extensions, will work in all IDEs.
要下载并运行源代码,请先安装 Git,然后将代码仓库拉取到本地。以下是从本书的 GitHub 代码仓库拉取代码的概要步骤:
To download and run the source code, install Git, and then pull the repository locally. Here are the high-level steps to pull the code from the book’s GitHub repository:
git clone https://github.com/cxbxmxcx/GPT-Agents.git
git clone https://github.com/cxbxmxcx/GPT-Agents.git
Python 有多种版本和部署方式。本书基于标准的 Python 安装,版本为 3.10。Anaconda 是另一个非常流行的 Python 部署方式,也可以使用。但是,本书中的所有内容均已在 Python 3.10 虚拟环境中运行和测试。
Python is provided through different versions and deployments. This book relies on the standard Python installation, version 3.10. Anaconda is another deployment of Python that is very popular and could be used. However, all the material in this book has been run and tested with a Python 3.10 virtual environment:
python--版本
python –-version
版本应该是 3.10,但如果不是,请不要担心。您可能安装了多个 Python 版本。我们也会在设置 VS Code 时确认安装情况。
The version should be 3.10, but if it isn’t, don't worry. You may have multiple Python versions installed. We’ll also confirm the installation when setting up VS Code.
安装 VS Code 相对简单,只需几个步骤即可完成:
Installing VS Code is relatively straightforward and can be done in just a few steps:
VS Code 运行后,我们可以安装必要的扩展。接下来我们将介绍这些扩展。
With VS Code running, we can install the necessary extensions. We’ll cover those extensions next.
数以千计的 VS Code 扩展程序可以提供出色的 Python 编码环境。推荐的扩展程序只是冰山一角,您可以自行探索更多选择。不过,请注意并非所有扩展程序都一样好。安装新扩展程序时,请查看安装次数和评分。通常建议避免使用评分低于四星的扩展程序。要安装扩展程序,请按照以下步骤操作:
Thousands of extensions for VS Code can provide an excellent Python coding environment. The recommended ones are only the start of what you can explore independently. Beware, though, that not all extensions are created equally. When installing new extensions, look at the number of installs and ratings. Extensions with fewer than four stars are generally to be avoided. To install the extensions, follow these steps:
您只需为每个运行的 VS Code 环境安装扩展程序。通常情况下,这意味着只需为您的操作系统安装的 VS Code 安装扩展程序即可。但是,如果您在容器中运行 VS Code,则必须为每个运行的容器安装扩展程序。本附录稍后将介绍如何在 Dev Containers 扩展程序中使用 Python。
You’ll only need to install the extensions for each VS Code environment you’re running. Typically, this will mean installing for just your operating system installation of VS Code. However, if you run VS Code in containers, you must install extensions for each container you’re running. Working with Python in the Dev Containers extension will be covered later in this appendix.
在开发 Python 项目时,您通常需要创建隔离的虚拟环境。这有助于管理不同任务和工具中的多个包依赖项。本书建议为每个新章节创建一个新的虚拟环境。VS Code 可以帮助您通过以下步骤快速高效地创建和管理多个 Python 环境:
When developing Python projects, you often want to create isolated virtual environments. This will help in managing multiple package dependencies across various tasks and tools. In this book, it’s recommended that a new virtual environment be created for each new chapter. VS Code can help you create and manage multiple Python environments quickly and efficiently via the following steps:
requirements.txt章节文件夹中的文件已被选中。这将安装当前章节所需的所有依赖项。requirements.txt file in the chapter folder is selected. This will install all the requirements for the current chapter. 您应该为本书的每个新章节重复这些步骤。另一种方法是使用 VS Code 开发容器,这将在下一节中介绍。
You should complete these steps for each new chapter of the book. The alternative is to use VS Code development containers, which will be covered in the next section.
使用高级代理以及能够生成和执行代码的代理时,通常建议将其运行在隔离的容器中。容器隔离可以防止操作系统中断或损坏,并为部署代理提供基础。
When working with advanced agents and agents that can generate and execute code, running them in isolated containers is generally recommended. Container isolation prevents operating system disruption or corruption and provides a base for deploying agents.
要全面掌握容器和 Docker 等平台可能需要花费大量精力。幸运的是,使用容器只需要很少的知识,而 VS Code 扩展程序让这一切变得更加容易。
Getting familiar with containers and platforms such as Docker can be an extensive undertaking to grasp everything. Fortunately, it takes very little knowledge to start using containers, and VS Code extensions make this even more accessible.
首先,您需要安装一套容器工具集。Docker 是免费的(如果您只是出于兴趣爱好或是一名学生),而且是最容易上手的。请按照以下说明安装 Docker 并开始使用容器:
You’ll first need to install a container toolset. Docker is free (provided you use the tool as a hobby or you’re a student) and the most accessible. Follow these instructions to install Docker and get started working with containers:
Docker 和 VS Code 配置完成后,您可以按照以下步骤开始使用开发容器:
With Docker and VS Code configured, you can move on to using Dev Containers by following these steps:
容器启动后,您的 VS Code 环境将自动连接。这样,您就可以在容器上开发代码,而无需担心依赖项出现问题。
After the container is launched, your VS Code environment will be connected. This allows you to develop code on the container without worrying about dependencies not working.
符号
SYMBOLS
一个
A
代理平台
agent platforms
行动
actions
代理人规划
agent planning
代理人
agents
B
B
C
C
D
D
E
E
F
F
函数
functions
反馈
feedback
G
G
GPT(生成式预训练Transformer)
GPTs (Generative Pretrained Transformers)
我
I
J
J
K
K
L
L
LM工作室
LM Studio
M
M
N
N
关系
Nexus
哦
O
OpenAI
OpenAI
P
P
Python
Python
简介
profiles
问
Q
拉
R
S
S
Streamlit
Streamlit
T
T
U
U
V
V
VS Code
VS Code
W
W
X
X
X(原名 Twitter)
X (formerly Twitter)
是
Y
YouTube
YouTube
Z
Z